Skip to content

Conversation

@caozhiyuan
Copy link
Contributor

@caozhiyuan caozhiyuan commented Nov 19, 2025

This pull request introduces support for "thinking" blocks and signatures in the Anthropic message translation pipeline, allowing richer reasoning metadata to be passed between Anthropic and OpenAI message formats. The main changes include updates to the message interfaces, translation logic, and streaming event handling to properly process and emit reasoning text and opaque signatures. The tokenizer is also updated to exclude reasoning metadata from token counts.

Anthropic "thinking" block and signature support:

  • Added signature field to the AnthropicThinkingBlock interface, and updated all relevant translation logic to extract and emit reasoning_text and reasoning_opaque from OpenAI messages into Anthropic "thinking" blocks. [1] [2] [3] [4] [5] [6] [7] [8] [9]

  • Updated streaming translation logic to emit "thinking" blocks and their signatures as separate content blocks, including handling for starting, updating, and stopping these blocks in the event stream. [1] [2]

State and event handling improvements:

  • Added thinkingBlockOpen state to AnthropicStreamState and ensured correct block lifecycle management for both text and thinking blocks in streaming translation. [1] [2] [3] [4]

Message content mapping and translation fixes:

  • Refactored content mapping to separate text and thinking blocks, and ensured reasoning metadata is correctly attached to assistant messages for both streaming and non-streaming translation. [1] [2] [3] [4] [5]

Tokenizer exclusion for reasoning metadata:

  • Updated the tokenizer to skip counting tokens for the reasoning_opaque field, preventing it from affecting token usage calculations.

Miscellaneous and infrastructure:

  • Set Bun server idleTimeout to 0 for improved server stability.

Summary by CodeRabbit

  • New Features

    • Assistant messages can include visible reasoning_text and opaque signatures; translations expose a thinking_budget and streaming can track open thinking blocks.
  • Chores

    • Server idle timeout disabled for more reliable streams.
    • Token counting now ignores opaque reasoning entries.
    • Model metadata may include min/max thinking budgets.
    • API/client version identifiers updated.
  • Tests

    • Tests updated to validate new reasoning fields and streaming state.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 19, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Introduces thinking-block metadata and a streaming flag, separates reasoning into reasoning_text and reasoning_opaque (signature), refactors stream translation into modular handlers with explicit block lifecycle and delta sequencing, exposes optional reasoning fields in copilot types, excludes reasoning_opaque from tokenization, updates tests, and sets Bun idleTimeout to 0.

Changes

Cohort / File(s) Summary
Anthropic types
src/routes/messages/anthropic-types.ts
Added signature: string to AnthropicThinkingBlock and thinkingBlockOpen: boolean to AnthropicStreamState.
Handler init
src/routes/messages/handler.ts
Initialize streamState with thinkingBlockOpen: false.
Stream translation
src/routes/messages/stream-translation.ts
Replaced inline chunk processing with modular handlers (handleMessageStart, handleThinkingText, handleContent, handleToolCalls, handleFinish, handleReasoningOpaque, closeThinkingBlockIfOpen, etc.); added explicit block lifecycle, delta sequencing, and tracking for thinking/content/tool_use blocks.
Non‑stream translation
src/routes/messages/non-stream-translation.ts
Derive modelId, compute thinking_budget via getThinkingBudget, separate reasoning into reasoning_text and reasoning_opaque (signature), added getAnthropicThinkBlocks, and avoid embedding thinking blocks into plain text; signatures propagated into translations.
Copilot types / API
src/services/copilot/create-chat-completions.ts
Delta and Message gain optional `reasoning_text?: string
Tokenizer
src/lib/tokenizer.ts
calculateMessageTokens now skips the reasoning_opaque key so its value is omitted from token counting/encoding.
Get models
src/services/copilot/get-models.ts
ModelSupports extended with max_thinking_budget?: number and min_thinking_budget?: number.
Tests
tests/anthropic-request.test.ts, tests/anthropic-response.test.ts
Tests updated to include signature in thinking blocks, assert reasoning_text for thinking content, and initialize thinkingBlockOpen in stream tests.
Server config
src/start.ts
Set bun: { idleTimeout: 0 } in serve configuration.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler
    participant Translator as StreamTranslator
    participant Emitter as EventEmitter

    Client->>Handler: send streaming chunk
    Handler->>Translator: translateChunkToAnthropicEvents(chunk, state)

    alt chunk has reasoning_text
        Translator->>Emitter: content_block_start (thinking)
        Translator->>Emitter: thinking_delta (reasoning_text)
        note right of Translator `#e6f4ea`: thinkingBlockOpen = true
    end

    alt chunk has reasoning_opaque (signature)
        Translator->>Translator: closeThinkingBlockIfOpen()
        Translator->>Emitter: signature_delta (reasoning_opaque)
        Translator->>Emitter: content_block_stop (thinking)
        note right of Translator `#fce8d8`: thinkingBlockOpen = false
    end

    alt chunk has text or tool_calls
        Translator->>Emitter: handleContent() / handleToolCalls() -> text_delta / tool_use deltas
    end

    alt stream finishes
        Translator->>Translator: handleFinish() (close blocks, emit stop_reason)
        Translator->>Emitter: message_delta + message_stop
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–60 minutes

  • Attention areas:
    • src/routes/messages/stream-translation.ts — handler ordering, delta sequencing, block lifecycle, and state transitions.
    • src/routes/messages/non-stream-translation.ts — thinking_budget computation, modelId propagation, and reasoning extraction/mapping.
    • src/services/copilot/create-chat-completions.ts — new optional reasoning fields and payload surface changes.
    • src/lib/tokenizer.ts — ensure safe exclusion of reasoning_opaque from token counts.
    • Tests — verify updated assertions and stream state initialization.

Poem

🐇 I nibble at deltas, tuck signatures neat,
Thinking blocks open, then close on quick feet,
Reasoning split clean — text and a hush,
Tokens skip secrets in a silent rush,
— A rabbit hops on, pleased with the patch 🌿

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title is vague and uses non-descriptive phrasing. It mentions multiple features ('gemini-3-pro', 'claude model', 'thinking') without clearly prioritizing the main change, making it difficult to understand the primary objective at a glance. Revise the title to focus on the primary change with clear, concise language. Consider: 'Add reasoning and thinking block support for Anthropic/OpenAI translation' or 'Support chat completions reasoning blocks across model providers'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@caozhiyuan
Copy link
Contributor Author

@ericc-ch feature about chat completions reasoning, support gemini-3-pro works ok, please help review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
tests/anthropic-response.test.ts (1)

250-256: AnthropicStreamState initialization updated correctly; consider DRY helper

Adding thinkingBlockOpen: false keeps the test state in sync with the new AnthropicStreamState shape. You now duplicate the full initial state literal in multiple tests; consider a small helper like createInitialStreamState() to avoid future drift when adding more fields.

Also applies to: 351-357

src/routes/messages/anthropic-types.ts (1)

56-60: Thinking block signature and signature_delta wiring looks consistent, but type is stricter than usage

Making AnthropicThinkingBlock carry a signature: string and adding the signature_delta variant to AnthropicContentBlockDeltaEvent matches how the stream translator now emits signatures and how non‑stream translation constructs thinking blocks.

One nuance: elsewhere you defensively treat signature as optional/possibly empty (e.g., checking b.signature && b.signature.length > 0 and sometimes emitting signature: ""). If the upstream protocol allows thinking blocks without a signature, you might want signature?: string to better reflect the wire shape, instead of relying on empty strings as a sentinel.

Also applies to: 140-148

tests/anthropic-request.test.ts (1)

128-158: Tests exercise reasoning_text but not reasoning_opaque mapping

The two thinking‑block tests now correctly:

  • Include signature on Anthropic thinking blocks.
  • Assert that reasoning_text is populated while content only carries user‑visible text.

However, neither test asserts that the Anthropic signature ends up on assistantMessage.reasoning_opaque. Adding something like:

expect(assistantMessage?.reasoning_opaque).toBe("abc123")

and

expect(assistantMessage?.reasoning_opaque).toBe("def456")

would close the coverage gap for the opaque reasoning metadata as well. You might also tweak the test comments to clarify that thinking content is surfaced via reasoning_text, not merged into content.

Also applies to: 160-201

src/services/copilot/create-chat-completions.ts (1)

72-86: Reasoning fields and exported streaming types are wired consistently

Exporting Delta and Choice and adding reasoning_text / reasoning_opaque to both streaming deltas and non‑stream messages matches how the translation layers consume them. The translator code only treats these as optional and nullable, which is consistent with the type definitions here.

If you want to tighten typings later, you could replace object for logprobs with a more specific shape or Record<string, unknown>, but that’s not blocking.

Also applies to: 88-93, 114-120, 166-175

src/routes/messages/non-stream-translation.ts (2)

274-296: getAnthropicThinkBlocks drops opaque signature when both text and opaque are present

In translateToAnthropic, thinking blocks are reconstructed from:

  • choice.message.reasoning_text
  • choice.message.reasoning_opaque

via getAnthropicThinkBlocks. Right now that helper does:

if (reasoningText) {
  return [{ type: "thinking", thinking: reasoningText, signature: "" }]
}
if (reasoningOpaque) {
  return [{ type: "thinking", thinking: "", signature: reasoningOpaque }]
}

So when both reasoning_text and reasoning_opaque are present, the opaque portion is silently discarded for the non‑stream Anthropic response. If upstream ever sends both fields together, you likely want to preserve the signature as well, e.g., by returning a single block with both populated or two blocks (one text, one signature).

Suggestion: adjust getAnthropicThinkBlocks to handle the “both present” case explicitly instead of treating them as mutually exclusive.

Also applies to: 320-359


274-296: Multi-choice stop reason precedence is a bit opaque

stopReason is initialized from the first choice and then updated with:

if (choice.finish_reason === "tool_calls" || stopReason === "stop") {
  stopReason = choice.finish_reason
}

This causes a later non‑tool reason (e.g., "length") to overwrite an earlier "stop" reason, which may not be what you intend. If the goal is simply “prefer tool_calls over everything else,” a more direct rule (e.g., only overwrite when choice.finish_reason === "tool_calls") would be easier to reason about.

Not a blocker, but worth double‑checking the intended precedence.

src/routes/messages/stream-translation.ts (1)

227-264: Consider advancing contentBlockIndex after handleReasoningOpaque emits a full thinking block

handleReasoningOpaque emits a complete thinking block lifecycle:

  • content_block_start (type "thinking")
  • a dummy thinking_delta
  • a signature_delta
  • content_block_stop

but it never updates state.contentBlockIndex or any open/closed flags. As a result, the next text or tool block may reuse the same index as this ephemeral thinking block, which can make the event stream harder to reason about for consumers that treat indices as unique per block.

A low‑impact improvement would be to increment state.contentBlockIndex at the end of handleReasoningOpaque (and keep contentBlockOpen / thinkingBlockOpen false), so each emitted block gets a distinct index.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ea08fe and a2467d3.

📒 Files selected for processing (7)
  • src/routes/messages/anthropic-types.ts (2 hunks)
  • src/routes/messages/handler.ts (1 hunks)
  • src/routes/messages/non-stream-translation.ts (5 hunks)
  • src/routes/messages/stream-translation.ts (3 hunks)
  • src/services/copilot/create-chat-completions.ts (4 hunks)
  • tests/anthropic-request.test.ts (4 hunks)
  • tests/anthropic-response.test.ts (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/anthropic-types.ts
  • src/routes/messages/stream-translation.ts
  • src/routes/messages/non-stream-translation.ts
  • src/routes/messages/handler.ts
  • tests/anthropic-request.test.ts
🧬 Code graph analysis (2)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
  • Choice (88-93)
  • ChatCompletionChunk (51-70)
  • Delta (72-86)
src/routes/messages/anthropic-types.ts (2)
  • AnthropicStreamState (196-208)
  • AnthropicStreamEventData (185-193)
src/routes/messages/utils.ts (1)
  • mapOpenAIStopReasonToAnthropic (3-16)
src/routes/messages/non-stream-translation.ts (1)
src/routes/messages/anthropic-types.ts (3)
  • AnthropicTextBlock (28-31)
  • AnthropicAssistantContentBlock (67-70)
  • AnthropicThinkingBlock (56-60)
🔇 Additional comments (4)
src/routes/messages/anthropic-types.ts (1)

195-208: New thinkingBlockOpen flag in stream state aligns with streaming helpers

Adding thinkingBlockOpen: boolean to AnthropicStreamState matches how stream-translation.ts now manages open thinking blocks separately from regular content blocks. The field is initialized in both the handler and tests, so state shape is consistent across call sites.

src/routes/messages/handler.ts (1)

56-64: Streaming state now correctly initializes thinkingBlockOpen

Including thinkingBlockOpen: false in the handler’s streamState keeps the runtime state in sync with AnthropicStreamState and the streaming translator’s expectations. No issues here.

src/routes/messages/non-stream-translation.ts (1)

126-180: Assistant reasoning is cleanly separated from visible content

handleAssistantMessage + the updated mapContent now:

  • Aggregate all thinking blocks into allThinkingContent and expose it via reasoning_text.
  • Select a single non‑empty signature into reasoning_opaque.
  • Ensure content only contains text (and images when present), with thinking and tool blocks filtered out.

This achieves the PR goal of keeping reasoning separate from user‑visible text and structuring tool calls via tool_calls. The mapContent filter to only include text blocks for non‑image content is particularly nice for avoiding accidental leakage of non‑display blocks.

Also applies to: 182-222

src/routes/messages/stream-translation.ts (1)

23-47: Streaming translator refactor improves readability and separation of concerns

The split of translateChunkToAnthropicEvents into handleMessageStart, handleThinkingText, handleContent, handleToolCalls, and handleFinish makes the streaming pipeline much easier to follow. State mutations (messageStartSent, contentBlockIndex, contentBlockOpen, thinkingBlockOpen, toolCalls) are now localized, which should reduce future bugs around mixing text/tool/thinking content.

Also applies to: 194-225

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/start.ts (1)

120-122: Disabling Bun idle timeout is helpful for streaming but has resource trade‑offs

Setting bun.idleTimeout = 0 will keep idle keep‑alive connections open indefinitely, which is good for long‑running reasoning/streaming but can retain dead or abandoned connections and increase resource usage under load.

Consider:

  • Making the idle timeout configurable via an env var (e.g., BUN_IDLE_TIMEOUT_SECONDS, defaulting to 0 or a high but finite value), and/or
  • Documenting that 0 is intentional for long‑lived streaming so operators know the impact if this is ever used in a shared/production setting.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2467d3 and 6c385b7.

📒 Files selected for processing (2)
  • src/routes/messages/non-stream-translation.ts (5 hunks)
  • src/start.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/non-stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/non-stream-translation.ts (1)
src/routes/messages/anthropic-types.ts (3)
  • AnthropicTextBlock (28-31)
  • AnthropicAssistantContentBlock (67-70)
  • AnthropicThinkingBlock (56-60)
🔇 Additional comments (4)
src/routes/messages/non-stream-translation.ts (4)

146-153: LGTM!

The extraction of thinking content and signature from multiple thinking blocks is well-handled. Concatenating thinking content and taking the first non-empty signature is a reasonable approach for the translation.


155-179: LGTM!

The reasoning fields are correctly added to assistant messages in both code paths (with and without tool calls), ensuring consistent handling of thinking block data.


194-200: LGTM!

The explicit filter for text blocks ensures that thinking blocks are excluded from regular content, which is correct since they're now handled separately via reasoning fields.


278-296: LGTM!

The refactor to use a unified assistantContentBlocks array that includes text, thinking, and tool use blocks is cleaner and correctly preserves the stop reason logic.

@caozhiyuan caozhiyuan force-pushed the feature/chat-completions-reasoning branch from 6c385b7 to 65a3568 Compare November 19, 2025 13:56
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/routes/messages/non-stream-translation.ts (1)

278-291: Anthropic response reconstruction is correct; consider ordering and naming nits

Combining text, thinking, and tool_use blocks into assistantContentBlocks using the new reasoning fields is logically sound. Two optional cleanups you might consider:

  • Put thingBlocks (thinking) before textBlocks to better mirror common Anthropic ordering (thinking preceding final text), if that matches how other paths structure content.
  • Rename thingBlocksthinkingBlocks for clarity.

Both are non-blocking and mostly about readability/consistency.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c385b7 and 65a3568.

📒 Files selected for processing (1)
  • src/routes/messages/non-stream-translation.ts (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/non-stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/non-stream-translation.ts (1)
src/routes/messages/anthropic-types.ts (3)
  • AnthropicTextBlock (28-31)
  • AnthropicAssistantContentBlock (67-70)
  • AnthropicThinkingBlock (56-60)
🔇 Additional comments (3)
src/routes/messages/non-stream-translation.ts (3)

142-179: Reasoning metadata extraction in assistant messages looks solid; confirm multi-signature expectations

The separation of thinking content into reasoning_text/reasoning_opaque while keeping content free of thinking blocks is correct and symmetric with the Anthropic→OpenAI path. One behavioral detail: when multiple thinking blocks carry different signature values, only the first non-empty signature is propagated. If upstream could ever emit multiple distinct signatures per message, that information would be collapsed here; otherwise this is fine as-is.


194-201: mapContent correctly strips non-text (including thinking) from visible content

The updated non-image path returning only joined text blocks is a good change: thinking and other non-text blocks are no longer mixed into the user-visible content, which aligns with the new dedicated reasoning fields and keeps Anthropic internals out of the OpenAI surface.


336-359: getAnthropicThinkBlocks now preserves both reasoning fields as intended

This helper correctly emits a single AnthropicThinkingBlock whenever either reasoningText or reasoningOpaque is non-empty, and preserves both when both are present (thinking: reasoningText, signature: reasoningOpaque || ""). That resolves the earlier data-loss concern around dropping reasoning_opaque and keeps the types aligned with AnthropicThinkingBlock.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/routes/messages/stream-translation.ts (1)

23-47: Standalone reasoning_opaque deltas are not processed

If a delta contains only reasoning_opaque (no reasoning_text, tool_calls, content, or finish_reason), it will be silently ignored:

  • handleThinkingText only processes reasoning_opaque when reasoning_text is present (line 300)
  • handleToolCalls calls handleReasoningOpaque only when tool_calls exist (line 105)
  • handleFinish calls handleReasoningOpaque only when finish_reason exists (line 68)

Per the streaming protocol, deltas may carry only opaque reasoning tokens without other fields (as noted in the previous review's web search).

Option 1: Unconditionally process reasoning_opaque in handleThinkingText, regardless of whether reasoning_text is present:

 function handleThinkingText(
   delta: Delta,
   state: AnthropicStreamState,
   events: Array<AnthropicStreamEventData>,
 ) {
   if (delta.reasoning_text && delta.reasoning_text.length > 0) {
     if (!state.thinkingBlockOpen) {
       events.push({
         type: "content_block_start",
         index: state.contentBlockIndex,
         content_block: {
           type: "thinking",
           thinking: "",
         },
       })
       state.thinkingBlockOpen = true
     }

     events.push({
       type: "content_block_delta",
       index: state.contentBlockIndex,
       delta: {
         type: "thinking_delta",
         thinking: delta.reasoning_text,
       },
     })
+  }

-    if (delta.reasoning_opaque && delta.reasoning_opaque.length > 0) {
+  // Handle reasoning_opaque even if there's no reasoning_text
+  if (delta.reasoning_opaque && delta.reasoning_opaque.length > 0) {
+    if (state.thinkingBlockOpen) {
+      // Close open thinking block with signature
       events.push(
         {
           type: "content_block_delta",
           index: state.contentBlockIndex,
           delta: {
             type: "signature_delta",
             signature: delta.reasoning_opaque,
           },
         },
         {
           type: "content_block_stop",
           index: state.contentBlockIndex,
         },
       )
       state.contentBlockIndex++
       state.thinkingBlockOpen = false
+    } else {
+      // No thinking block open - create complete reasoning block
+      handleReasoningOpaque(delta, events, state)
     }
   }
 }

Option 2: Add standalone check in main function before handleFinish:

   handleContent(delta, state, events)

   handleToolCalls(delta, state, events)

+  // Handle standalone reasoning_opaque not caught by other handlers
+  if (
+    delta.reasoning_opaque &&
+    !state.thinkingBlockOpen &&
+    !state.contentBlockOpen
+  ) {
+    handleReasoningOpaque(delta, events, state)
+  }

   handleFinish(choice, state, { events, chunk })

I recommend Option 1 as it centralizes all reasoning handling in one place.

♻️ Duplicate comments (1)
src/routes/messages/stream-translation.ts (1)

49-94: Thinking blocks must be closed before message finishes

If state.thinkingBlockOpen is true when finish_reason arrives, the thinking block is never explicitly closed before message_stop. Per streaming protocol, thinking/content blocks must receive an explicit content_block_stop event before the message finishes.

Scenario:

  1. Delta with reasoning_text opens thinking block (thinkingBlockOpen = true)
  2. Final delta with only finish_reason (no reasoning_opaque, no other content)
  3. handleFinish closes contentBlockOpen but leaves thinkingBlockOpen = true
  4. message_stop is emitted without closing the thinking block

Apply this diff:

 function handleFinish(
   choice: Choice,
   state: AnthropicStreamState,
   context: {
     events: Array<AnthropicStreamEventData>
     chunk: ChatCompletionChunk
   },
 ) {
   const { events, chunk } = context
   if (choice.finish_reason && choice.finish_reason.length > 0) {
+    closeThinkingBlockIfOpen(state, events)
+
     if (state.contentBlockOpen) {
       context.events.push({
         type: "content_block_stop",
         index: state.contentBlockIndex,
       })
       state.contentBlockOpen = false
       state.contentBlockIndex++
     }

     handleReasoningOpaque(choice.delta, events, state)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65a3568 and ec66480.

📒 Files selected for processing (1)
  • src/routes/messages/stream-translation.ts (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
  • Choice (88-93)
  • ChatCompletionChunk (51-70)
  • Delta (72-86)
src/routes/messages/anthropic-types.ts (2)
  • AnthropicStreamState (196-208)
  • AnthropicStreamEventData (185-193)
src/routes/messages/utils.ts (1)
  • mapOpenAIStopReasonToAnthropic (3-16)
🔇 Additional comments (4)
src/routes/messages/stream-translation.ts (4)

1-47: Refactoring improves code organization

The modular handler approach (splitting into handleMessageStart, handleThinkingText, handleContent, handleToolCalls, handleFinish) makes the streaming translation logic clearer and easier to maintain. The imports of Choice and Delta types are appropriate.


96-199: Content and tool call handling looks correct

The handleToolCalls and handleContent functions properly invoke closeThinkingBlockIfOpen before processing their respective content types, ensuring thinking blocks are closed at the right boundaries. The block lifecycle management (opening, emitting deltas, closing, incrementing indices) follows the expected pattern.


201-232: Message start handling is correct

The handleMessageStart function correctly emits the message_start event once with proper initialization of token counters and cache metadata.


321-343: Helper function correctly manages thinking block closure

The closeThinkingBlockIfOpen helper properly emits signature_delta with an empty string and content_block_stop, then increments the index and resets the state flag. This is the correct pattern for closing thinking blocks.

@caozhiyuan caozhiyuan force-pushed the feature/chat-completions-reasoning branch from ec66480 to 9adc6e0 Compare November 19, 2025 15:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/routes/messages/stream-translation.ts (2)

269-315: Thinking text streaming generally coherent, but tightly coupled to finish logic

The handleThinkingText logic for opening/extending a thinking block and optionally attaching a signature_delta when reasoning_opaque is present is consistent on its own, and it correctly bumps contentBlockIndex and resets thinkingBlockOpen on closure.

The main caveats are its interaction with handleFinish (thinking block not closed when only finish_reason arrives, and potential double-signature when both reasoning_text and reasoning_opaque appear on the final chunk), which I’ve detailed in the handleFinish comment. Any changes there should be coordinated with this function to avoid duplicated or missing thinking closures.


49-94: Thinking block may remain open at finish and reasoning_opaque can be double-emitted

Two intertwined issues around finish handling:

  1. Open thinking block not closed on finish

    • handleFinish only closes state.contentBlockOpen.
    • If you’ve streamed one or more reasoning_text chunks (so state.thinkingBlockOpen === true) and then receive a final chunk that carries only finish_reason (no content, no tool_calls, no reasoning_opaque), the thinking block never gets a content_block_stop before message_stop. closeThinkingBlockIfOpen is never called from here.
  2. Double processing of reasoning_opaque on final chunks

    • handleThinkingText can already consume reasoning_opaque when it appears together with reasoning_text, emitting a signature_delta and content_block_stop and advancing contentBlockIndex.
    • handleFinish then unconditionally calls handleReasoningOpaque(choice.delta, …). For a final delta that contains both reasoning_text and reasoning_opaque, this opens a second thinking block with an empty thinking_delta plus another signature_delta, effectively duplicating the signature on a new block.

These behaviors match a previously raised concern and still look fragile with respect to upstream reasoning delta shapes.

A safer approach would be along the lines of:

  • In handleFinish, before emitting the message_delta/message_stop, explicitly close any open thinking block, ideally attaching choice.delta.reasoning_opaque to that block (if present) rather than starting a new one.
  • Only fall back to handleReasoningOpaque when there was no open thinking block for this delta (e.g., a reasoning-opaque-only final chunk).
  • Ensure that chunks which carry only finish_reason still cause a thinking block to be closed if state.thinkingBlockOpen is true.

This keeps every thinking block explicitly terminated and avoids emitting duplicate signature-only thinking blocks for the same reasoning payload.

🧹 Nitpick comments (2)
src/routes/messages/stream-translation.ts (2)

230-267: Reasoning-opaque helper currently assumes finish-only usage

handleReasoningOpaque emits a complete thinking block (start → empty thinking_delta → signature_delta → stop) at the current contentBlockIndex but never increments the index afterward. With the current code it is only called from handleFinish, so no subsequent blocks will reuse that index in the same message and this is functionally safe.

However, if this helper is ever reused earlier in the stream (e.g., from handleToolCalls or handleContent), not incrementing state.contentBlockIndex after the content_block_stop will cause index collisions with later blocks. It would be safer to either:

  • Increment state.contentBlockIndex++ after the stop event, or
  • Document that this helper is strictly for “final signature at finish” and should not be used mid-stream.

317-338: Thinking-block closer matches state model, but signature payload is hard-coded

closeThinkingBlockIfOpen correctly:

  • Emits a signature_delta followed by content_block_stop.
  • Increments contentBlockIndex.
  • Resets thinkingBlockOpen.

The only questionable part is that it always uses signature: "". If upstream can ever provide a final reasoning_opaque token on the same chunk that triggers closure, you may eventually want a variant that takes the current Delta and uses the real opaque signature here rather than an empty string. For now this is acceptable as a fallback closer, but it’s another place that will need adjustment if you rework finish-time reasoning handling.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec66480 and 9adc6e0.

📒 Files selected for processing (1)
  • src/routes/messages/stream-translation.ts (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
  • Choice (88-93)
  • ChatCompletionChunk (51-70)
  • Delta (72-86)
src/routes/messages/anthropic-types.ts (2)
  • AnthropicStreamState (196-208)
  • AnthropicStreamEventData (185-193)
src/routes/messages/utils.ts (1)
  • mapOpenAIStopReasonToAnthropic (3-16)
🔇 Additional comments (5)
src/routes/messages/stream-translation.ts (5)

1-5: Typed imports for streaming deltas look correct

Importing ChatCompletionChunk, Choice, and Delta from the shared Copilot types keeps this translator aligned with the upstream chunk shape; no issues here.


23-47: Single-choice assumption and handler ordering

translateChunkToAnthropicEvents always takes chunk.choices[0] and ignores any additional choices, and processes the delta in the fixed order: thinking → content → tools → finish. If Copilot ever streams multiple choices, they’ll be dropped here, but if the API contract guarantees a single streaming choice this is fine. The handler ordering also seems consistent with “reasoning first, then content/tools, then finish”.


96-154: Tool call streaming handling is consistent with Anthropic tool_use blocks

Tool call handling looks solid:

  • Closes any open thinking block before tools via closeThinkingBlockIfOpen.
  • Closes an existing content block before starting a new tool_use block.
  • Tracks tool calls by OpenAI tool_calls[index] and maps them to Anthropic content_block_start/input_json_delta events, supporting incremental arguments deltas.

Given the current state model, this should generate valid tool_use blocks without index collisions.


156-195: Content block handling correctly separates text from tools/thinking

The content handler:

  • Closes any open thinking block before text, preserving the “reasoning first, then visible text” invariant.
  • Detects and closes an open tool block via isToolBlockOpen before starting a text block.
  • Lazily starts a text block and appends text_delta events for delta.content.

This matches Anthropic’s expectations for text content blocks and keeps block indices advancing correctly.


197-228: message_start initialization and usage wiring look good

handleMessageStart emits a single message_start per stream, initializes usage with effective input tokens (excluding cached tokens), and defers non-zero output_tokens to the final message_delta. This matches the intended pattern of “lightweight start, accurate usage at finish”.

…g order when stream=false and exclude reasoning_opaque from token calculation in calculateMessageTokens
@caozhiyuan caozhiyuan force-pushed the feature/chat-completions-reasoning branch from 3f52575 to 3fa5519 Compare November 19, 2025 22:49
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/routes/messages/stream-translation.ts (1)

49-96: Critical: Thinking blocks are not explicitly closed before message finishes

A previous review flagged this issue and marked it as addressed (commit ec66480), but the fix is not present in the current code.

The bug: When reasoning_text deltas are followed by a chunk containing only finish_reason (no content, tool_calls, or reasoning_text), state.thinkingBlockOpen remains true, but handleFinish only closes contentBlockOpen (line 59). The thinking block never receives an explicit content_block_stop before message_stop (lines 91-93), violating Anthropic's streaming protocol.

Example flow:

  1. Delta 1: { reasoning_text: "thinking..." } → opens thinking block, sets thinkingBlockOpen = true
  2. Delta 2: { finish_reason: "stop" }handleFinish emits message_stop without closing thinking block

Anthropic's protocol requires all content blocks (including thinking blocks) to be explicitly terminated with content_block_stop before message_stop.

Apply this diff to close any open thinking block before emitting the final message events:

 function handleFinish(
   choice: Choice,
   state: AnthropicStreamState,
   context: {
     events: Array<AnthropicStreamEventData>
     chunk: ChatCompletionChunk
   },
 ) {
   const { events, chunk } = context
   if (choice.finish_reason && choice.finish_reason.length > 0) {
+    // Close any open thinking block before finishing
+    closeThinkingBlockIfOpen(state, events)
+
     if (state.contentBlockOpen) {
       const toolBlockOpen = isToolBlockOpen(state)
       context.events.push({
         type: "content_block_stop",
         index: state.contentBlockIndex,
       })
       state.contentBlockOpen = false
       state.contentBlockIndex++
       if (!toolBlockOpen) {
         handleReasoningOpaque(choice.delta, events, state)
       }
     }

     events.push(
       {
         type: "message_delta",
         delta: {
           stop_reason: mapOpenAIStopReasonToAnthropic(choice.finish_reason),
           stop_sequence: null,
         },
         usage: {
           input_tokens:
             (chunk.usage?.prompt_tokens ?? 0)
             - (chunk.usage?.prompt_tokens_details?.cached_tokens ?? 0),
           output_tokens: chunk.usage?.completion_tokens ?? 0,
           ...(chunk.usage?.prompt_tokens_details?.cached_tokens
             !== undefined && {
             cache_read_input_tokens:
               chunk.usage.prompt_tokens_details.cached_tokens,
           }),
         },
       },
       {
         type: "message_stop",
       },
     )
   }
 }
🧹 Nitpick comments (1)
src/routes/messages/stream-translation.ts (1)

250-288: Simplify signature-only thinking blocks by removing unnecessary empty thinking_delta

Line 269-271 emits a thinking_delta with an empty string before the signature_delta, but Anthropic's streaming protocol does not require this. A thinking block may contain only content_block_start, signature_delta, and content_block_stop without any thinking_delta event.

Remove the empty thinking_delta (lines 269-271) to reduce unnecessary events in the stream:

{
  type: "content_block_start",
  index: state.contentBlockIndex,
  content_block: {
    type: "thinking",
    thinking: "",
  },
},
-{
-  type: "content_block_delta",
-  index: state.contentBlockIndex,
-  delta: {
-    type: "thinking_delta",
-    thinking: "",
-  },
-},
{
  type: "content_block_delta",
  index: state.contentBlockIndex,
  delta: {
    type: "signature_delta",
    signature: delta.reasoning_opaque,
  },
},
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3fa5519 and dfb40d2.

📒 Files selected for processing (1)
  • src/routes/messages/stream-translation.ts (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
  • Choice (88-93)
  • ChatCompletionChunk (51-70)
  • Delta (72-86)
src/routes/messages/anthropic-types.ts (2)
  • AnthropicStreamState (196-208)
  • AnthropicStreamEventData (185-193)
src/routes/messages/utils.ts (1)
  • mapOpenAIStopReasonToAnthropic (3-16)

Copilot AI review requested due to automatic review settings December 10, 2025 14:14
@caozhiyuan caozhiyuan changed the title feature about chat completions reasoning, support gemini-3-pro works ok feature about chat completions reasoning, support gemini-3-pro works ok and support claude model enable thinking Dec 10, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for reasoning/thinking capabilities by introducing the translation of Anthropic "thinking" blocks to OpenAI's reasoning_text and reasoning_opaque fields. The changes enable richer reasoning metadata to flow between Anthropic and OpenAI message formats in both streaming and non-streaming modes, with support for thinking budgets and proper token counting exclusions.

Key Changes:

  • Added signature field to AnthropicThinkingBlock and corresponding streaming state tracking with thinkingBlockOpen
  • Implemented bidirectional translation between Anthropic thinking blocks and OpenAI reasoning fields (reasoning_text, reasoning_opaque)
  • Introduced thinking budget calculation and model-specific filtering for Claude models
  • Updated tokenizer to exclude reasoning_opaque from token counts

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/routes/messages/anthropic-types.ts Added signature field to AnthropicThinkingBlock and thinkingBlockOpen to stream state
src/routes/messages/non-stream-translation.ts Implemented thinking budget calculation, signature extraction, and model-specific filtering for Claude
src/routes/messages/stream-translation.ts Added streaming handlers for thinking text, reasoning opaque, and thinking block lifecycle management
src/routes/messages/handler.ts Initialized thinkingBlockOpen state for streaming
src/services/copilot/create-chat-completions.ts Added reasoning_text and reasoning_opaque to Delta, ResponseMessage, and Message interfaces; added thinking_budget parameter
src/services/copilot/get-models.ts Added max_thinking_budget and min_thinking_budget to ModelSupports interface
src/lib/tokenizer.ts Excluded reasoning_opaque from token count calculations
src/start.ts Set Bun server idleTimeout to 0
tests/anthropic-request.test.ts Added signature field to test cases and updated assertions to check reasoning_text
tests/anthropic-response.test.ts Added thinkingBlockOpen initialization to stream state in tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

})
}

// handle for claude model
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "handle for claude model" is unclear and doesn't explain what the specific edge case is or why empty content with reasoning_opaque needs special handling. Consider updating to something like: "Handle Claude models that emit reasoning signature after thinking block with empty content delta".

Suggested change
// handle for claude model
// Handle Claude models that emit reasoning signature after thinking block with empty content delta

Copilot uses AI. Check for mistakes.
Comment on lines +78 to 79
} else if (model.startsWith("claude-opus-4-")) {
return model.replace(/^claude-opus-4-.*/, "claude-opus-4")
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from model.startsWith("claude-opus-") to model.startsWith("claude-opus-4-") makes the pattern more restrictive. This means models like "claude-opus-3" or "claude-opus-5" would no longer match this condition. If this is intentional to only handle claude-opus-4 variants, the change is correct. However, verify this doesn't break compatibility with other claude-opus versions.

Suggested change
} else if (model.startsWith("claude-opus-4-")) {
return model.replace(/^claude-opus-4-.*/, "claude-opus-4")
} else if (model.startsWith("claude-opus-")) {
return model.replace(/^claude-opus-.*/, "claude-opus")

Copilot uses AI. Check for mistakes.
// Process all choices to extract text and tool use blocks
for (const choice of response.choices) {
const textBlocks = getAnthropicTextBlocks(choice.message.content)
const thingBlocks = getAnthropicThinkBlocks(
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name thingBlocks appears to be a typo and should be thinkBlocks or thinkingBlocks to be consistent with the function name getAnthropicThinkBlocks and the purpose (processing thinking blocks).

Copilot uses AI. Check for mistakes.
Comment on lines +274 to 311
function handleReasoningOpaque(
delta: Delta,
events: Array<AnthropicStreamEventData>,
state: AnthropicStreamState,
) {
if (delta.reasoning_opaque && delta.reasoning_opaque.length > 0) {
events.push(
{
type: "message_stop",
type: "content_block_start",
index: state.contentBlockIndex,
content_block: {
type: "thinking",
thinking: "",
},
},
{
type: "content_block_delta",
index: state.contentBlockIndex,
delta: {
type: "thinking_delta",
thinking: "",
},
},
{
type: "content_block_delta",
index: state.contentBlockIndex,
delta: {
type: "signature_delta",
signature: delta.reasoning_opaque,
},
},
{
type: "content_block_stop",
index: state.contentBlockIndex,
},
)
state.contentBlockIndex++
}
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handleReasoningOpaque function doesn't set state.thinkingBlockOpen = true when opening a thinking block, but it does increment state.contentBlockIndex. This could lead to inconsistent state tracking. Consider setting state.thinkingBlockOpen = true after pushing the content_block_start event, or ensure the thinking block lifecycle is properly tracked.

Copilot uses AI. Check for mistakes.
Comment on lines +314 to 365
function handleThinkingText(
delta: Delta,
state: AnthropicStreamState,
events: Array<AnthropicStreamEventData>,
) {
if (delta.reasoning_text && delta.reasoning_text.length > 0) {
if (!state.thinkingBlockOpen) {
events.push({
type: "content_block_start",
index: state.contentBlockIndex,
content_block: {
type: "thinking",
thinking: "",
},
})
state.thinkingBlockOpen = true
}

events.push({
type: "content_block_delta",
index: state.contentBlockIndex,
delta: {
type: "thinking_delta",
thinking: delta.reasoning_text,
},
})
}
}

function closeThinkingBlockIfOpen(
state: AnthropicStreamState,
events: Array<AnthropicStreamEventData>,
): void {
if (state.thinkingBlockOpen) {
events.push(
{
type: "content_block_delta",
index: state.contentBlockIndex,
delta: {
type: "signature_delta",
signature: "",
},
},
{
type: "content_block_stop",
index: state.contentBlockIndex,
},
)
state.contentBlockIndex++
state.thinkingBlockOpen = false
}
}
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new streaming translation functions (handleThinkingText, handleReasoningOpaque, closeThinkingBlockIfOpen) lack test coverage. Consider adding tests to verify:

  1. Thinking blocks are opened/closed correctly in streaming
  2. Signature deltas are emitted properly
  3. State transitions (thinkingBlockOpen) work correctly
  4. Empty content with reasoning_opaque is handled properly
  5. Interaction between thinking blocks and content/tool blocks

Copilot uses AI. Check for mistakes.
})
state.contentBlockOpen = false
state.contentBlockIndex++
if (!toolBlockOpen) {
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In handleFinish, after closing a content block, handleReasoningOpaque is called with choice.delta. However, when finish_reason is present, the delta typically doesn't contain new reasoning_opaque data - this was likely already processed in earlier chunks. This could result in emitting an unnecessary empty thinking block at the end. Consider checking if choice.delta.reasoning_opaque exists before calling handleReasoningOpaque, or verify this behavior matches the expected Anthropic streaming protocol.

Suggested change
if (!toolBlockOpen) {
if (!toolBlockOpen && choice.delta.reasoning_opaque) {

Copilot uses AI. Check for mistakes.
Comment on lines +217 to +238
if (
delta.content === ""
&& delta.reasoning_opaque
&& delta.reasoning_opaque.length > 0
) {
events.push(
{
type: "message_delta",
type: "content_block_delta",
index: state.contentBlockIndex,
delta: {
stop_reason: mapOpenAIStopReasonToAnthropic(choice.finish_reason),
stop_sequence: null,
type: "signature_delta",
signature: delta.reasoning_opaque,
},
},
{
type: "content_block_stop",
index: state.contentBlockIndex,
},
)
state.contentBlockIndex++
state.thinkingBlockOpen = false
}
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When delta.content === "", this special case closes the thinking block but doesn't check if state.thinkingBlockOpen is true. If thinkingBlockOpen is false, this logic would try to close a block that was never opened in the streaming state, potentially emitting incorrect events. Consider adding a check: if (state.thinkingBlockOpen) { ... } to ensure consistency.

Copilot uses AI. Check for mistakes.
fetch: server.fetch as ServerHandler,
port: options.port,
bun: {
idleTimeout: 0,
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting idleTimeout: 0 disables the idle timeout completely, which means connections will never be closed due to inactivity. This could lead to resource exhaustion if many idle connections accumulate. Consider using a reasonable timeout value (e.g., 60000 ms for 60 seconds) instead, unless there's a specific requirement for infinite idle timeout. Document the reason if this is intentional.

Suggested change
idleTimeout: 0,
// Set idleTimeout to 60 seconds to avoid resource exhaustion from idle connections
idleTimeout: 60000,

Copilot uses AI. Check for mistakes.
Comment on lines 57 to 72
function getThinkingBudget(
payload: AnthropicMessagesPayload,
model: Model | undefined,
): number | undefined {
const thinking = payload.thinking
if (model && thinking) {
const maxThinkingBudget = Math.min(
model.capabilities.supports.max_thinking_budget ?? 0,
(model.capabilities.limits.max_output_tokens ?? 0) - 1,
)
if (maxThinkingBudget > 0 && thinking.budget_tokens !== undefined) {
return Math.min(thinking.budget_tokens, maxThinkingBudget)
}
}
return undefined
}
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new getThinkingBudget function and thinking_budget parameter lack test coverage. Consider adding tests to verify:

  1. Correct calculation when model supports thinking budget
  2. Capping at model's max_thinking_budget
  3. Capping at (max_output_tokens - 1)
  4. Returns undefined when model doesn't support thinking budget
  5. Returns undefined when payload.thinking is not provided

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7657d87 and 7f8187b.

📒 Files selected for processing (4)
  • src/routes/messages/non-stream-translation.ts (11 hunks)
  • src/routes/messages/stream-translation.ts (3 hunks)
  • src/services/copilot/create-chat-completions.ts (5 hunks)
  • src/services/copilot/get-models.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/stream-translation.ts
  • src/routes/messages/non-stream-translation.ts
📚 Learning: 2025-11-24T05:50:12.368Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/responses-stream-translation.ts:451-462
Timestamp: 2025-11-24T05:50:12.368Z
Learning: In src/routes/messages/responses-stream-translation.ts, the handleErrorEvent function intentionally does not call closeAllOpenBlocks before sending an error event to the client. This is acceptable in their error handling design - clients handle error events without requiring balanced content_block_stop events.

Applied to files:

  • src/routes/messages/stream-translation.ts
🧬 Code graph analysis (2)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
  • Choice (88-93)
  • ChatCompletionChunk (51-70)
  • Delta (72-86)
src/routes/messages/anthropic-types.ts (2)
  • AnthropicStreamState (196-208)
  • AnthropicStreamEventData (185-193)
src/routes/messages/utils.ts (1)
  • mapOpenAIStopReasonToAnthropic (3-16)
src/routes/messages/non-stream-translation.ts (4)
src/lib/state.ts (1)
  • state (20-25)
src/routes/messages/anthropic-types.ts (4)
  • AnthropicMessagesPayload (3-26)
  • AnthropicThinkingBlock (56-60)
  • AnthropicTextBlock (28-31)
  • AnthropicAssistantContentBlock (67-70)
src/services/copilot/get-models.ts (1)
  • Model (44-57)
src/services/copilot/create-chat-completions.ts (1)
  • Message (167-176)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🔇 Additional comments (16)
src/services/copilot/get-models.ts (1)

28-29: LGTM! Type extension for thinking budgets.

The addition of optional max_thinking_budget and min_thinking_budget fields cleanly extends the ModelSupports interface to accommodate per-model thinking budget constraints, maintaining backward compatibility.

src/services/copilot/create-chat-completions.ts (3)

72-86: LGTM! Delta interface extended for reasoning metadata.

The addition of reasoning_text and reasoning_opaque optional fields, along with the export statement, cleanly extends the streaming delta capabilities to support reasoning content propagation.


88-93: LGTM! Choice interface extended and exported.

The addition of the logprobs field and export statement appropriately extends the streaming choice interface.


117-118: LGTM! Consistent reasoning field additions.

The optional reasoning_text, reasoning_opaque, and thinking_budget fields are consistently added across ResponseMessage, ChatCompletionsPayload, and Message interfaces, maintaining backward compatibility while enabling reasoning metadata propagation.

Also applies to: 155-155, 174-175

src/routes/messages/stream-translation.ts (7)

1-5: LGTM! Import updates for typed handlers.

The addition of Choice and Delta type imports appropriately supports the refactored handler functions with proper typing.


36-46: LGTM! Modular handler orchestration.

The refactored main translation flow cleanly separates concerns into dedicated handlers, improving maintainability and readability. The handler invocation sequence is logical and comprehensive.


49-96: LGTM! Finish handler with proper block closure.

The finish handler correctly closes open content blocks, processes reasoning opaque signatures for non-tool blocks, and emits the final message delta and stop events with proper token usage propagation.


98-158: LGTM! Tool call handler with proper block transitions.

The tool call handler correctly closes thinking blocks, processes reasoning opaque signatures, and manages tool_use block lifecycle with appropriate content_block_start, delta, and stop events.


160-174: LGTM! Reasoning opaque handler for tool transitions.

The function correctly closes non-tool content blocks before emitting reasoning opaque signatures during tool call transitions.


176-239: LGTM! Content handler with claude-specific reasoning opaque handling.

The content handler properly closes thinking blocks before text content, manages text block lifecycle, and includes claude-specific handling for reasoning opaque signatures with empty content.


241-272: LGTM! Remaining handlers implement proper block lifecycle.

All remaining handlers correctly manage their respective block lifecycles:

  • handleMessageStart: Emits the initial message_start event once
  • handleReasoningOpaque: Emits complete thinking blocks with signatures and increments the index
  • handleThinkingText: Opens thinking blocks and streams reasoning text deltas
  • closeThinkingBlockIfOpen: Properly terminates thinking blocks with signature deltas

Also applies to: 274-312, 314-341, 343-365

src/routes/messages/non-stream-translation.ts (5)

35-37: LGTM! Model lookup for thinking budget computation.

The model lookup logic correctly translates the model name, finds the model in state, and passes it to thinking budget computation. The optional chaining on state.models?.data handles cases where models may not be loaded.


57-72: LGTM! Thinking budget computation with proper bounds.

The function correctly computes the thinking budget by respecting both model-specific limits and payload requests, appropriately handling undefined models and returning undefined when thinking is not configured.


169-195: LGTM! Claude-specific thinking block filtering.

The claude-specific filtering correctly ensures that only thinking blocks with valid content and signatures (excluding GPT-style signatures containing "@") are processed, improving cross-model compatibility.


238-240: LGTM! Explicit text-only filtering in content mapping.

The updated filtering correctly isolates text blocks and excludes thinking blocks from content mapping when no images are present.


377-400: LGTM! Thinking block construction preserves both fields.

The function correctly constructs thinking blocks from reasoning metadata:

  • When both reasoning_text and reasoning_opaque are present, they're combined into a single block
  • When only one is present, it creates an appropriate block with the available field
  • No data loss occurs

@caozhiyuan
Copy link
Contributor Author

@ericc-ch also supported the Claude model thinking feature, this PR has no objections and can be merged. in claude code , can toggle think on.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/routes/messages/non-stream-translation.ts (1)

57-75: Clamp final thinking budget so it never exceeds maxThinkingBudget

The current logic can return a value larger than maxThinkingBudget if model.capabilities.supports.min_thinking_budget is greater than the computed maxThinkingBudget (e.g., misconfigured model caps). In that case:

  • maxThinkingBudget might be 512
  • min_thinking_budget might be 1024
  • Math.max(budgetTokens, min_thinking_budget) will return 1024, overshooting the max you just computed.

To be defensive against inconsistent caps, consider clamping the final value back down to maxThinkingBudget:

-    const maxThinkingBudget = Math.min(
-      model.capabilities.supports.max_thinking_budget ?? 0,
-      (model.capabilities.limits.max_output_tokens ?? 0) - 1,
-    )
-    if (maxThinkingBudget > 0 && thinking.budget_tokens !== undefined) {
-      const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget)
-      return Math.max(
-        budgetTokens,
-        model.capabilities.supports.min_thinking_budget ?? 1024,
-      )
-    }
+    const maxThinkingBudget = Math.min(
+      model.capabilities.supports.max_thinking_budget ?? 0,
+      (model.capabilities.limits.max_output_tokens ?? 0) - 1,
+    )
+    if (maxThinkingBudget > 0 && thinking.budget_tokens !== undefined) {
+      const minThinkingBudget =
+        model.capabilities.supports.min_thinking_budget ?? 1024
+      const requested = Math.min(thinking.budget_tokens, maxThinkingBudget)
+      const lowerBound = Math.min(minThinkingBudget, maxThinkingBudget)
+      const clamped = Math.max(requested, lowerBound)
+      return clamped
+    }

Also worth adding a couple of unit tests around the edge cases (min > max, missing caps, budget_tokens > max) to lock in the intended behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f8187b and cbe12eb.

📒 Files selected for processing (1)
  • src/routes/messages/non-stream-translation.ts (11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/non-stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/non-stream-translation.ts (4)
src/lib/state.ts (1)
  • state (20-25)
src/routes/messages/anthropic-types.ts (4)
  • AnthropicMessagesPayload (3-26)
  • AnthropicThinkingBlock (56-60)
  • AnthropicTextBlock (28-31)
  • AnthropicAssistantContentBlock (67-70)
src/services/copilot/get-models.ts (1)
  • Model (44-57)
src/services/copilot/create-chat-completions.ts (1)
  • Message (167-176)
🔇 Additional comments (4)
src/routes/messages/non-stream-translation.ts (4)

32-55: Model normalization and thinking_budget wiring look consistent

Using translateModelName to derive modelId, looking up the Model from state.models, and then:

  • passing modelId both as the outbound model and into translateAnthropicMessagesToOpenAI, and
  • computing thinking_budget via getThinkingBudget only when a matching Model and payload.thinking exist

is a clean setup and avoids sending a thinking budget for unknown/unsupported models. No issues here.


78-86: Claude-specific thinking-block filtering looks reasonable

Passing modelId into handleAssistantMessage and then:

  • treating models with modelId.startsWith("claude") specially, and
  • requiring non-empty thinking and signature, while excluding signatures containing "@" (to avoid misclassifying GPT signatures as Claude reasoning)

is a sensible way to avoid mixing vendors’ reasoning metadata. The aggregation of:

  • reasoning_text as joined thinking contents, and
  • reasoning_opaque as the first non-empty signature

also looks correct for feeding downstream consumers.

Also applies to: 156-199


227-245: mapContent correctly strips thinking blocks from visible assistant/user content

In the no-image path, mapContent now filters to block.type === "text" before joining, which ensures:

  • thinking blocks never leak into the visible content string, and
  • reasoning is only exposed via the dedicated reasoning_text/reasoning_opaque fields.

This separation is important for clients that expect assistant content to be user-facing only.


319-335: Thinking/text/tool aggregation and Anthropic thinking block mapping look good

  • translateToAnthropic now builds a single assistantContentBlocks array per response, pushing thinkBlocks, textBlocks, and toolUseBlocks per choice.
  • getAnthropicTextBlocks skips empty strings, avoiding empty text blocks.
  • getAnthropicThinkBlocks now preserves reasoning_opaque when reasoning_text is present by setting signature: reasoningOpaque || "", and still emits a single thinking block when only reasoning_opaque is provided.

Overall this gives a consistent round-trip: OpenAI reasoning_text/reasoning_opaque → Anthropic thinking block(s) → back to OpenAI reasoning fields, without dropping the signature anymore.

Also applies to: 343-349, 365-404

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/services/get-vscode-version.ts (1)

33-33: Consider lazy evaluation instead of module-level execution.

The module-level await causes a network fetch on every import, potentially delaying module load by up to 5 seconds on timeout. Consider removing this line and invoking getVSCodeVersion() only when needed (lazy evaluation), or caching the result if multiple consumers require it.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbe12eb and ebcacb2.

📒 Files selected for processing (2)
  • src/lib/api-config.ts (1 hunks)
  • src/services/get-vscode-version.ts (1 hunks)
🔇 Additional comments (2)
src/lib/api-config.ts (2)

10-12: Version bump for plugin/user-agent headers looks consistent

Updating COPILOT_VERSION and deriving both EDITOR_PLUGIN_VERSION and USER_AGENT from it keeps header values internally consistent. As long as 0.33.5 matches the actual shipped plugin version you want to advertise, this change is fine.

Please double‑check that 0.33.5 is the intended public/plugin version string for all clients using these headers.


14-47: API version header bump may affect backend compatibility

Changing API_VERSION to "2025-10-01" will be sent in x-github-api-version for both Copilot and GitHub API calls. Behavior is unchanged locally, but if this date doesn’t correspond to a valid/stable API version on the server side, calls could fail or change semantics.

Confirm that "2025-10-01" is a supported GitHub/Copilot API version in your target environment and that you’ve validated flows (auth, chat, completions, etc.) against it.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/api-config.ts (1)

19-31: Guard against Bearer undefined / vscode/undefined in headers.
State.copilotToken and State.vsCodeVersion are optional, but are interpolated unconditionally. Fail fast (or provide defaults) to avoid confusing auth failures.

 export const copilotHeaders = (state: State, vision: boolean = false) => {
+  if (!state.copilotToken) throw new Error("Missing state.copilotToken")
+  if (!state.vsCodeVersion) throw new Error("Missing state.vsCodeVersion")
   const headers: Record<string, string> = {
     Authorization: `Bearer ${state.copilotToken}`,
     "content-type": standardHeaders()["content-type"],
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d6f7aa and dcafbe1.

📒 Files selected for processing (1)
  • src/lib/api-config.ts (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/lib/api-config.ts (1)
src/lib/state.ts (2)
  • state (20-25)
  • State (3-18)
🔇 Additional comments (2)
src/lib/api-config.ts (2)

27-27: Remove unverified concern about openai-intent header value.

The value conversation-agent was deliberately set (per git commit message "correct openai-intent header value") and is used consistently across all GitHub Copilot API endpoints. There is no evidence of conversation-panel being a valid alternative, and the project targets a single upstream (GitHub Copilot API), not multiple upstreams. This header value has been stable since inception with no reported issues.


10-15: Verify these version values are currently supported upstream—format is correct but versions may be unsupported or deprecated.

COPILOT_VERSION="0.35.0" and API_VERSION="2025-10-01" use GitHub's documented formats (x-github-api-version accepts YYYY-MM-DD dates; GitHubCopilotChat/<version> is the correct user agent pattern). However, you must confirm these specific values are accepted by the Copilot API endpoint—if they've been deprecated or aren't released yet, requests fail at runtime.

Additionally, line 21 silently passes Bearer undefined if state.copilotToken is missing, and line 17 does not validate state.accountType before constructing the URL.

Comment on lines 16 to +18
export const copilotBaseUrl = (state: State) =>
state.accountType === "individual" ?
"https://api.githubcopilot.com"
: `https://api.${state.accountType}.githubcopilot.com`
`https://api.${state.accountType}.githubcopilot.com`

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

copilotBaseUrl should not blindly template state.accountType (host correctness + potential SSRF surface).
If accountType isn’t strictly validated to a small enum, this can generate invalid/unexpected hosts. Consider mapping known values and defaulting/throwing on unknowns.

 export const copilotBaseUrl = (state: State) =>
-  `https://api.${state.accountType}.githubcopilot.com`
+  `https://api.${normalizeCopilotAccountType(state.accountType)}.githubcopilot.com`
+
+const normalizeCopilotAccountType = (accountType: string) => {
+  // adjust allowed values to your actual supported set
+  if (accountType === "individual" || accountType === "enterprise") return accountType
+  throw new Error(`Unsupported accountType: ${accountType}`)
+}
🤖 Prompt for AI Agents
In src/lib/api-config.ts around lines 16 to 18, copilotBaseUrl currently
templates state.accountType directly which can produce invalid hosts or open an
SSRF surface; replace the direct template with a strict mapping from allowed
accountType values (e.g., map 'personal'|'business'|'enterprise' to their exact
host fragments) and either default to a safe known host or throw/return an error
for unknown values, ensuring only validated enum values are used to build the
URL rather than interpolating raw input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant