-
-
Notifications
You must be signed in to change notification settings - Fork 303
feature about chat completions reasoning, support gemini-3-pro works ok and support claude model enable thinking #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughIntroduces thinking-block metadata and a streaming flag, separates reasoning into reasoning_text and reasoning_opaque (signature), refactors stream translation into modular handlers with explicit block lifecycle and delta sequencing, exposes optional reasoning fields in copilot types, excludes reasoning_opaque from tokenization, updates tests, and sets Bun idleTimeout to 0. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Handler
participant Translator as StreamTranslator
participant Emitter as EventEmitter
Client->>Handler: send streaming chunk
Handler->>Translator: translateChunkToAnthropicEvents(chunk, state)
alt chunk has reasoning_text
Translator->>Emitter: content_block_start (thinking)
Translator->>Emitter: thinking_delta (reasoning_text)
note right of Translator `#e6f4ea`: thinkingBlockOpen = true
end
alt chunk has reasoning_opaque (signature)
Translator->>Translator: closeThinkingBlockIfOpen()
Translator->>Emitter: signature_delta (reasoning_opaque)
Translator->>Emitter: content_block_stop (thinking)
note right of Translator `#fce8d8`: thinkingBlockOpen = false
end
alt chunk has text or tool_calls
Translator->>Emitter: handleContent() / handleToolCalls() -> text_delta / tool_use deltas
end
alt stream finishes
Translator->>Translator: handleFinish() (close blocks, emit stop_reason)
Translator->>Emitter: message_delta + message_stop
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–60 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@ericc-ch feature about chat completions reasoning, support gemini-3-pro works ok, please help review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
tests/anthropic-response.test.ts (1)
250-256: AnthropicStreamState initialization updated correctly; consider DRY helperAdding
thinkingBlockOpen: falsekeeps the test state in sync with the newAnthropicStreamStateshape. You now duplicate the full initial state literal in multiple tests; consider a small helper likecreateInitialStreamState()to avoid future drift when adding more fields.Also applies to: 351-357
src/routes/messages/anthropic-types.ts (1)
56-60: Thinking blocksignatureandsignature_deltawiring looks consistent, but type is stricter than usageMaking
AnthropicThinkingBlockcarry asignature: stringand adding thesignature_deltavariant toAnthropicContentBlockDeltaEventmatches how the stream translator now emits signatures and how non‑stream translation constructs thinking blocks.One nuance: elsewhere you defensively treat
signatureas optional/possibly empty (e.g., checkingb.signature && b.signature.length > 0and sometimes emittingsignature: ""). If the upstream protocol allows thinking blocks without a signature, you might wantsignature?: stringto better reflect the wire shape, instead of relying on empty strings as a sentinel.Also applies to: 140-148
tests/anthropic-request.test.ts (1)
128-158: Tests exercisereasoning_textbut notreasoning_opaquemappingThe two thinking‑block tests now correctly:
- Include
signatureon Anthropic thinking blocks.- Assert that
reasoning_textis populated whilecontentonly carries user‑visible text.However, neither test asserts that the Anthropic
signatureends up onassistantMessage.reasoning_opaque. Adding something like:expect(assistantMessage?.reasoning_opaque).toBe("abc123")and
expect(assistantMessage?.reasoning_opaque).toBe("def456")would close the coverage gap for the opaque reasoning metadata as well. You might also tweak the test comments to clarify that thinking content is surfaced via
reasoning_text, not merged intocontent.Also applies to: 160-201
src/services/copilot/create-chat-completions.ts (1)
72-86: Reasoning fields and exported streaming types are wired consistentlyExporting
DeltaandChoiceand addingreasoning_text/reasoning_opaqueto both streaming deltas and non‑stream messages matches how the translation layers consume them. The translator code only treats these as optional and nullable, which is consistent with the type definitions here.If you want to tighten typings later, you could replace
objectforlogprobswith a more specific shape orRecord<string, unknown>, but that’s not blocking.Also applies to: 88-93, 114-120, 166-175
src/routes/messages/non-stream-translation.ts (2)
274-296:getAnthropicThinkBlocksdrops opaque signature when both text and opaque are presentIn
translateToAnthropic, thinking blocks are reconstructed from:
choice.message.reasoning_textchoice.message.reasoning_opaquevia
getAnthropicThinkBlocks. Right now that helper does:if (reasoningText) { return [{ type: "thinking", thinking: reasoningText, signature: "" }] } if (reasoningOpaque) { return [{ type: "thinking", thinking: "", signature: reasoningOpaque }] }So when both
reasoning_textandreasoning_opaqueare present, the opaque portion is silently discarded for the non‑stream Anthropic response. If upstream ever sends both fields together, you likely want to preserve the signature as well, e.g., by returning a single block with both populated or two blocks (one text, one signature).Suggestion: adjust
getAnthropicThinkBlocksto handle the “both present” case explicitly instead of treating them as mutually exclusive.Also applies to: 320-359
274-296: Multi-choice stop reason precedence is a bit opaque
stopReasonis initialized from the first choice and then updated with:if (choice.finish_reason === "tool_calls" || stopReason === "stop") { stopReason = choice.finish_reason }This causes a later non‑tool reason (e.g.,
"length") to overwrite an earlier"stop"reason, which may not be what you intend. If the goal is simply “prefertool_callsover everything else,” a more direct rule (e.g., only overwrite whenchoice.finish_reason === "tool_calls") would be easier to reason about.Not a blocker, but worth double‑checking the intended precedence.
src/routes/messages/stream-translation.ts (1)
227-264: Consider advancingcontentBlockIndexafterhandleReasoningOpaqueemits a full thinking block
handleReasoningOpaqueemits a complete thinking block lifecycle:
content_block_start(type"thinking")- a dummy
thinking_delta- a
signature_deltacontent_block_stopbut it never updates
state.contentBlockIndexor any open/closed flags. As a result, the next text or tool block may reuse the sameindexas this ephemeral thinking block, which can make the event stream harder to reason about for consumers that treat indices as unique per block.A low‑impact improvement would be to increment
state.contentBlockIndexat the end ofhandleReasoningOpaque(and keepcontentBlockOpen/thinkingBlockOpenfalse), so each emitted block gets a distinct index.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/routes/messages/anthropic-types.ts(2 hunks)src/routes/messages/handler.ts(1 hunks)src/routes/messages/non-stream-translation.ts(5 hunks)src/routes/messages/stream-translation.ts(3 hunks)src/services/copilot/create-chat-completions.ts(4 hunks)tests/anthropic-request.test.ts(4 hunks)tests/anthropic-response.test.ts(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.
Applied to files:
src/routes/messages/anthropic-types.tssrc/routes/messages/stream-translation.tssrc/routes/messages/non-stream-translation.tssrc/routes/messages/handler.tstests/anthropic-request.test.ts
🧬 Code graph analysis (2)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
Choice(88-93)ChatCompletionChunk(51-70)Delta(72-86)src/routes/messages/anthropic-types.ts (2)
AnthropicStreamState(196-208)AnthropicStreamEventData(185-193)src/routes/messages/utils.ts (1)
mapOpenAIStopReasonToAnthropic(3-16)
src/routes/messages/non-stream-translation.ts (1)
src/routes/messages/anthropic-types.ts (3)
AnthropicTextBlock(28-31)AnthropicAssistantContentBlock(67-70)AnthropicThinkingBlock(56-60)
🔇 Additional comments (4)
src/routes/messages/anthropic-types.ts (1)
195-208: NewthinkingBlockOpenflag in stream state aligns with streaming helpersAdding
thinkingBlockOpen: booleantoAnthropicStreamStatematches howstream-translation.tsnow manages open thinking blocks separately from regular content blocks. The field is initialized in both the handler and tests, so state shape is consistent across call sites.src/routes/messages/handler.ts (1)
56-64: Streaming state now correctly initializesthinkingBlockOpenIncluding
thinkingBlockOpen: falsein the handler’sstreamStatekeeps the runtime state in sync withAnthropicStreamStateand the streaming translator’s expectations. No issues here.src/routes/messages/non-stream-translation.ts (1)
126-180: Assistant reasoning is cleanly separated from visible content
handleAssistantMessage+ the updatedmapContentnow:
- Aggregate all
thinkingblocks intoallThinkingContentand expose it viareasoning_text.- Select a single non‑empty
signatureintoreasoning_opaque.- Ensure
contentonly contains text (and images when present), with thinking and tool blocks filtered out.This achieves the PR goal of keeping reasoning separate from user‑visible text and structuring tool calls via
tool_calls. ThemapContentfilter to only includetextblocks for non‑image content is particularly nice for avoiding accidental leakage of non‑display blocks.Also applies to: 182-222
src/routes/messages/stream-translation.ts (1)
23-47: Streaming translator refactor improves readability and separation of concernsThe split of
translateChunkToAnthropicEventsintohandleMessageStart,handleThinkingText,handleContent,handleToolCalls, andhandleFinishmakes the streaming pipeline much easier to follow. State mutations (messageStartSent,contentBlockIndex,contentBlockOpen,thinkingBlockOpen,toolCalls) are now localized, which should reduce future bugs around mixing text/tool/thinking content.Also applies to: 194-225
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/start.ts (1)
120-122: Disabling Bun idle timeout is helpful for streaming but has resource trade‑offsSetting
bun.idleTimeout = 0will keep idle keep‑alive connections open indefinitely, which is good for long‑running reasoning/streaming but can retain dead or abandoned connections and increase resource usage under load.Consider:
- Making the idle timeout configurable via an env var (e.g.,
BUN_IDLE_TIMEOUT_SECONDS, defaulting to0or a high but finite value), and/or- Documenting that
0is intentional for long‑lived streaming so operators know the impact if this is ever used in a shared/production setting.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/routes/messages/non-stream-translation.ts(5 hunks)src/start.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.
Applied to files:
src/routes/messages/non-stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/non-stream-translation.ts (1)
src/routes/messages/anthropic-types.ts (3)
AnthropicTextBlock(28-31)AnthropicAssistantContentBlock(67-70)AnthropicThinkingBlock(56-60)
🔇 Additional comments (4)
src/routes/messages/non-stream-translation.ts (4)
146-153: LGTM!The extraction of thinking content and signature from multiple thinking blocks is well-handled. Concatenating thinking content and taking the first non-empty signature is a reasonable approach for the translation.
155-179: LGTM!The reasoning fields are correctly added to assistant messages in both code paths (with and without tool calls), ensuring consistent handling of thinking block data.
194-200: LGTM!The explicit filter for text blocks ensures that thinking blocks are excluded from regular content, which is correct since they're now handled separately via reasoning fields.
278-296: LGTM!The refactor to use a unified
assistantContentBlocksarray that includes text, thinking, and tool use blocks is cleaner and correctly preserves the stop reason logic.
6c385b7 to
65a3568
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/routes/messages/non-stream-translation.ts (1)
278-291: Anthropic response reconstruction is correct; consider ordering and naming nitsCombining
text,thinking, andtool_useblocks intoassistantContentBlocksusing the new reasoning fields is logically sound. Two optional cleanups you might consider:
- Put
thingBlocks(thinking) beforetextBlocksto better mirror common Anthropic ordering (thinkingpreceding final text), if that matches how other paths structure content.- Rename
thingBlocks→thinkingBlocksfor clarity.Both are non-blocking and mostly about readability/consistency.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/routes/messages/non-stream-translation.ts(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.
Applied to files:
src/routes/messages/non-stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/non-stream-translation.ts (1)
src/routes/messages/anthropic-types.ts (3)
AnthropicTextBlock(28-31)AnthropicAssistantContentBlock(67-70)AnthropicThinkingBlock(56-60)
🔇 Additional comments (3)
src/routes/messages/non-stream-translation.ts (3)
142-179: Reasoning metadata extraction in assistant messages looks solid; confirm multi-signature expectationsThe separation of thinking content into
reasoning_text/reasoning_opaquewhile keepingcontentfree ofthinkingblocks is correct and symmetric with the Anthropic→OpenAI path. One behavioral detail: when multiple thinking blocks carry differentsignaturevalues, only the first non-empty signature is propagated. If upstream could ever emit multiple distinct signatures per message, that information would be collapsed here; otherwise this is fine as-is.
194-201: mapContent correctly strips non-text (including thinking) from visible contentThe updated non-image path returning only joined
textblocks is a good change:thinkingand other non-text blocks are no longer mixed into the user-visiblecontent, which aligns with the new dedicated reasoning fields and keeps Anthropic internals out of the OpenAI surface.
336-359: getAnthropicThinkBlocks now preserves both reasoning fields as intendedThis helper correctly emits a single
AnthropicThinkingBlockwhenever eitherreasoningTextorreasoningOpaqueis non-empty, and preserves both when both are present (thinking: reasoningText,signature: reasoningOpaque || ""). That resolves the earlier data-loss concern around droppingreasoning_opaqueand keeps the types aligned withAnthropicThinkingBlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/routes/messages/stream-translation.ts (1)
23-47: Standalonereasoning_opaquedeltas are not processedIf a delta contains only
reasoning_opaque(noreasoning_text,tool_calls,content, orfinish_reason), it will be silently ignored:
handleThinkingTextonly processesreasoning_opaquewhenreasoning_textis present (line 300)handleToolCallscallshandleReasoningOpaqueonly whentool_callsexist (line 105)handleFinishcallshandleReasoningOpaqueonly whenfinish_reasonexists (line 68)Per the streaming protocol, deltas may carry only opaque reasoning tokens without other fields (as noted in the previous review's web search).
Option 1: Unconditionally process
reasoning_opaqueinhandleThinkingText, regardless of whetherreasoning_textis present:function handleThinkingText( delta: Delta, state: AnthropicStreamState, events: Array<AnthropicStreamEventData>, ) { if (delta.reasoning_text && delta.reasoning_text.length > 0) { if (!state.thinkingBlockOpen) { events.push({ type: "content_block_start", index: state.contentBlockIndex, content_block: { type: "thinking", thinking: "", }, }) state.thinkingBlockOpen = true } events.push({ type: "content_block_delta", index: state.contentBlockIndex, delta: { type: "thinking_delta", thinking: delta.reasoning_text, }, }) + } - if (delta.reasoning_opaque && delta.reasoning_opaque.length > 0) { + // Handle reasoning_opaque even if there's no reasoning_text + if (delta.reasoning_opaque && delta.reasoning_opaque.length > 0) { + if (state.thinkingBlockOpen) { + // Close open thinking block with signature events.push( { type: "content_block_delta", index: state.contentBlockIndex, delta: { type: "signature_delta", signature: delta.reasoning_opaque, }, }, { type: "content_block_stop", index: state.contentBlockIndex, }, ) state.contentBlockIndex++ state.thinkingBlockOpen = false + } else { + // No thinking block open - create complete reasoning block + handleReasoningOpaque(delta, events, state) } } }Option 2: Add standalone check in main function before
handleFinish:handleContent(delta, state, events) handleToolCalls(delta, state, events) + // Handle standalone reasoning_opaque not caught by other handlers + if ( + delta.reasoning_opaque && + !state.thinkingBlockOpen && + !state.contentBlockOpen + ) { + handleReasoningOpaque(delta, events, state) + } handleFinish(choice, state, { events, chunk })I recommend Option 1 as it centralizes all reasoning handling in one place.
♻️ Duplicate comments (1)
src/routes/messages/stream-translation.ts (1)
49-94: Thinking blocks must be closed before message finishesIf
state.thinkingBlockOpenistruewhenfinish_reasonarrives, the thinking block is never explicitly closed beforemessage_stop. Per streaming protocol, thinking/content blocks must receive an explicitcontent_block_stopevent before the message finishes.Scenario:
- Delta with
reasoning_textopens thinking block (thinkingBlockOpen = true)- Final delta with only
finish_reason(noreasoning_opaque, no other content)handleFinishclosescontentBlockOpenbut leavesthinkingBlockOpen = truemessage_stopis emitted without closing the thinking blockApply this diff:
function handleFinish( choice: Choice, state: AnthropicStreamState, context: { events: Array<AnthropicStreamEventData> chunk: ChatCompletionChunk }, ) { const { events, chunk } = context if (choice.finish_reason && choice.finish_reason.length > 0) { + closeThinkingBlockIfOpen(state, events) + if (state.contentBlockOpen) { context.events.push({ type: "content_block_stop", index: state.contentBlockIndex, }) state.contentBlockOpen = false state.contentBlockIndex++ } handleReasoningOpaque(choice.delta, events, state)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/routes/messages/stream-translation.ts(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.
Applied to files:
src/routes/messages/stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
Choice(88-93)ChatCompletionChunk(51-70)Delta(72-86)src/routes/messages/anthropic-types.ts (2)
AnthropicStreamState(196-208)AnthropicStreamEventData(185-193)src/routes/messages/utils.ts (1)
mapOpenAIStopReasonToAnthropic(3-16)
🔇 Additional comments (4)
src/routes/messages/stream-translation.ts (4)
1-47: Refactoring improves code organizationThe modular handler approach (splitting into
handleMessageStart,handleThinkingText,handleContent,handleToolCalls,handleFinish) makes the streaming translation logic clearer and easier to maintain. The imports ofChoiceandDeltatypes are appropriate.
96-199: Content and tool call handling looks correctThe
handleToolCallsandhandleContentfunctions properly invokecloseThinkingBlockIfOpenbefore processing their respective content types, ensuring thinking blocks are closed at the right boundaries. The block lifecycle management (opening, emitting deltas, closing, incrementing indices) follows the expected pattern.
201-232: Message start handling is correctThe
handleMessageStartfunction correctly emits themessage_startevent once with proper initialization of token counters and cache metadata.
321-343: Helper function correctly manages thinking block closureThe
closeThinkingBlockIfOpenhelper properly emitssignature_deltawith an empty string andcontent_block_stop, then increments the index and resets the state flag. This is the correct pattern for closing thinking blocks.
ec66480 to
9adc6e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/routes/messages/stream-translation.ts (2)
269-315: Thinking text streaming generally coherent, but tightly coupled to finish logicThe
handleThinkingTextlogic for opening/extending a thinking block and optionally attaching asignature_deltawhenreasoning_opaqueis present is consistent on its own, and it correctly bumpscontentBlockIndexand resetsthinkingBlockOpenon closure.The main caveats are its interaction with
handleFinish(thinking block not closed when onlyfinish_reasonarrives, and potential double-signature when bothreasoning_textandreasoning_opaqueappear on the final chunk), which I’ve detailed in thehandleFinishcomment. Any changes there should be coordinated with this function to avoid duplicated or missing thinking closures.
49-94: Thinking block may remain open at finish and reasoning_opaque can be double-emittedTwo intertwined issues around finish handling:
Open thinking block not closed on finish
handleFinishonly closesstate.contentBlockOpen.- If you’ve streamed one or more
reasoning_textchunks (sostate.thinkingBlockOpen === true) and then receive a final chunk that carries onlyfinish_reason(nocontent, notool_calls, noreasoning_opaque), the thinking block never gets acontent_block_stopbeforemessage_stop.closeThinkingBlockIfOpenis never called from here.Double processing of
reasoning_opaqueon final chunks
handleThinkingTextcan already consumereasoning_opaquewhen it appears together withreasoning_text, emitting asignature_deltaandcontent_block_stopand advancingcontentBlockIndex.handleFinishthen unconditionally callshandleReasoningOpaque(choice.delta, …). For a final delta that contains bothreasoning_textandreasoning_opaque, this opens a second thinking block with an emptythinking_deltaplus anothersignature_delta, effectively duplicating the signature on a new block.These behaviors match a previously raised concern and still look fragile with respect to upstream reasoning delta shapes.
A safer approach would be along the lines of:
- In
handleFinish, before emitting themessage_delta/message_stop, explicitly close any open thinking block, ideally attachingchoice.delta.reasoning_opaqueto that block (if present) rather than starting a new one.- Only fall back to
handleReasoningOpaquewhen there was no open thinking block for this delta (e.g., a reasoning-opaque-only final chunk).- Ensure that chunks which carry only
finish_reasonstill cause a thinking block to be closed ifstate.thinkingBlockOpenis true.This keeps every thinking block explicitly terminated and avoids emitting duplicate signature-only thinking blocks for the same reasoning payload.
🧹 Nitpick comments (2)
src/routes/messages/stream-translation.ts (2)
230-267: Reasoning-opaque helper currently assumes finish-only usage
handleReasoningOpaqueemits a complete thinking block (start → empty thinking_delta → signature_delta → stop) at the currentcontentBlockIndexbut never increments the index afterward. With the current code it is only called fromhandleFinish, so no subsequent blocks will reuse that index in the same message and this is functionally safe.However, if this helper is ever reused earlier in the stream (e.g., from
handleToolCallsorhandleContent), not incrementingstate.contentBlockIndexafter thecontent_block_stopwill cause index collisions with later blocks. It would be safer to either:
- Increment
state.contentBlockIndex++after the stop event, or- Document that this helper is strictly for “final signature at finish” and should not be used mid-stream.
317-338: Thinking-block closer matches state model, but signature payload is hard-coded
closeThinkingBlockIfOpencorrectly:
- Emits a
signature_deltafollowed bycontent_block_stop.- Increments
contentBlockIndex.- Resets
thinkingBlockOpen.The only questionable part is that it always uses
signature: "". If upstream can ever provide a finalreasoning_opaquetoken on the same chunk that triggers closure, you may eventually want a variant that takes the currentDeltaand uses the real opaque signature here rather than an empty string. For now this is acceptable as a fallback closer, but it’s another place that will need adjustment if you rework finish-time reasoning handling.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/routes/messages/stream-translation.ts(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.
Applied to files:
src/routes/messages/stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
Choice(88-93)ChatCompletionChunk(51-70)Delta(72-86)src/routes/messages/anthropic-types.ts (2)
AnthropicStreamState(196-208)AnthropicStreamEventData(185-193)src/routes/messages/utils.ts (1)
mapOpenAIStopReasonToAnthropic(3-16)
🔇 Additional comments (5)
src/routes/messages/stream-translation.ts (5)
1-5: Typed imports for streaming deltas look correctImporting
ChatCompletionChunk,Choice, andDeltafrom the shared Copilot types keeps this translator aligned with the upstream chunk shape; no issues here.
23-47: Single-choice assumption and handler ordering
translateChunkToAnthropicEventsalways takeschunk.choices[0]and ignores any additional choices, and processes the delta in the fixed order: thinking → content → tools → finish. If Copilot ever streams multiple choices, they’ll be dropped here, but if the API contract guarantees a single streaming choice this is fine. The handler ordering also seems consistent with “reasoning first, then content/tools, then finish”.
96-154: Tool call streaming handling is consistent with Anthropic tool_use blocksTool call handling looks solid:
- Closes any open thinking block before tools via
closeThinkingBlockIfOpen.- Closes an existing content block before starting a new
tool_useblock.- Tracks tool calls by OpenAI
tool_calls[index]and maps them to Anthropiccontent_block_start/input_json_deltaevents, supporting incrementalargumentsdeltas.Given the current state model, this should generate valid
tool_useblocks without index collisions.
156-195: Content block handling correctly separates text from tools/thinkingThe content handler:
- Closes any open thinking block before text, preserving the “reasoning first, then visible text” invariant.
- Detects and closes an open tool block via
isToolBlockOpenbefore starting a text block.- Lazily starts a text block and appends
text_deltaevents fordelta.content.This matches Anthropic’s expectations for text content blocks and keeps block indices advancing correctly.
197-228: message_start initialization and usage wiring look good
handleMessageStartemits a singlemessage_startper stream, initializes usage with effective input tokens (excluding cached tokens), and defers non-zerooutput_tokensto the finalmessage_delta. This matches the intended pattern of “lightweight start, accurate usage at finish”.
…g order when stream=false and exclude reasoning_opaque from token calculation in calculateMessageTokens
3f52575 to
3fa5519
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/routes/messages/stream-translation.ts (1)
49-96: Critical: Thinking blocks are not explicitly closed before message finishesA previous review flagged this issue and marked it as addressed (commit ec66480), but the fix is not present in the current code.
The bug: When
reasoning_textdeltas are followed by a chunk containing onlyfinish_reason(nocontent,tool_calls, orreasoning_text),state.thinkingBlockOpenremainstrue, buthandleFinishonly closescontentBlockOpen(line 59). The thinking block never receives an explicitcontent_block_stopbeforemessage_stop(lines 91-93), violating Anthropic's streaming protocol.Example flow:
- Delta 1:
{ reasoning_text: "thinking..." }→ opens thinking block, setsthinkingBlockOpen = true- Delta 2:
{ finish_reason: "stop" }→handleFinishemitsmessage_stopwithout closing thinking blockAnthropic's protocol requires all content blocks (including thinking blocks) to be explicitly terminated with
content_block_stopbeforemessage_stop.Apply this diff to close any open thinking block before emitting the final message events:
function handleFinish( choice: Choice, state: AnthropicStreamState, context: { events: Array<AnthropicStreamEventData> chunk: ChatCompletionChunk }, ) { const { events, chunk } = context if (choice.finish_reason && choice.finish_reason.length > 0) { + // Close any open thinking block before finishing + closeThinkingBlockIfOpen(state, events) + if (state.contentBlockOpen) { const toolBlockOpen = isToolBlockOpen(state) context.events.push({ type: "content_block_stop", index: state.contentBlockIndex, }) state.contentBlockOpen = false state.contentBlockIndex++ if (!toolBlockOpen) { handleReasoningOpaque(choice.delta, events, state) } } events.push( { type: "message_delta", delta: { stop_reason: mapOpenAIStopReasonToAnthropic(choice.finish_reason), stop_sequence: null, }, usage: { input_tokens: (chunk.usage?.prompt_tokens ?? 0) - (chunk.usage?.prompt_tokens_details?.cached_tokens ?? 0), output_tokens: chunk.usage?.completion_tokens ?? 0, ...(chunk.usage?.prompt_tokens_details?.cached_tokens !== undefined && { cache_read_input_tokens: chunk.usage.prompt_tokens_details.cached_tokens, }), }, }, { type: "message_stop", }, ) } }
🧹 Nitpick comments (1)
src/routes/messages/stream-translation.ts (1)
250-288: Simplify signature-only thinking blocks by removing unnecessary emptythinking_deltaLine 269-271 emits a
thinking_deltawith an empty string before thesignature_delta, but Anthropic's streaming protocol does not require this. A thinking block may contain onlycontent_block_start,signature_delta, andcontent_block_stopwithout anythinking_deltaevent.Remove the empty
thinking_delta(lines 269-271) to reduce unnecessary events in the stream:{ type: "content_block_start", index: state.contentBlockIndex, content_block: { type: "thinking", thinking: "", }, }, -{ - type: "content_block_delta", - index: state.contentBlockIndex, - delta: { - type: "thinking_delta", - thinking: "", - }, -}, { type: "content_block_delta", index: state.contentBlockIndex, delta: { type: "signature_delta", signature: delta.reasoning_opaque, }, },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/routes/messages/stream-translation.ts(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.
Applied to files:
src/routes/messages/stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
Choice(88-93)ChatCompletionChunk(51-70)Delta(72-86)src/routes/messages/anthropic-types.ts (2)
AnthropicStreamState(196-208)AnthropicStreamEventData(185-193)src/routes/messages/utils.ts (1)
mapOpenAIStopReasonToAnthropic(3-16)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for reasoning/thinking capabilities by introducing the translation of Anthropic "thinking" blocks to OpenAI's reasoning_text and reasoning_opaque fields. The changes enable richer reasoning metadata to flow between Anthropic and OpenAI message formats in both streaming and non-streaming modes, with support for thinking budgets and proper token counting exclusions.
Key Changes:
- Added
signaturefield toAnthropicThinkingBlockand corresponding streaming state tracking withthinkingBlockOpen - Implemented bidirectional translation between Anthropic thinking blocks and OpenAI reasoning fields (
reasoning_text,reasoning_opaque) - Introduced thinking budget calculation and model-specific filtering for Claude models
- Updated tokenizer to exclude
reasoning_opaquefrom token counts
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
src/routes/messages/anthropic-types.ts |
Added signature field to AnthropicThinkingBlock and thinkingBlockOpen to stream state |
src/routes/messages/non-stream-translation.ts |
Implemented thinking budget calculation, signature extraction, and model-specific filtering for Claude |
src/routes/messages/stream-translation.ts |
Added streaming handlers for thinking text, reasoning opaque, and thinking block lifecycle management |
src/routes/messages/handler.ts |
Initialized thinkingBlockOpen state for streaming |
src/services/copilot/create-chat-completions.ts |
Added reasoning_text and reasoning_opaque to Delta, ResponseMessage, and Message interfaces; added thinking_budget parameter |
src/services/copilot/get-models.ts |
Added max_thinking_budget and min_thinking_budget to ModelSupports interface |
src/lib/tokenizer.ts |
Excluded reasoning_opaque from token count calculations |
src/start.ts |
Set Bun server idleTimeout to 0 |
tests/anthropic-request.test.ts |
Added signature field to test cases and updated assertions to check reasoning_text |
tests/anthropic-response.test.ts |
Added thinkingBlockOpen initialization to stream state in tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }) | ||
| } | ||
|
|
||
| // handle for claude model |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "handle for claude model" is unclear and doesn't explain what the specific edge case is or why empty content with reasoning_opaque needs special handling. Consider updating to something like: "Handle Claude models that emit reasoning signature after thinking block with empty content delta".
| // handle for claude model | |
| // Handle Claude models that emit reasoning signature after thinking block with empty content delta |
| } else if (model.startsWith("claude-opus-4-")) { | ||
| return model.replace(/^claude-opus-4-.*/, "claude-opus-4") |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from model.startsWith("claude-opus-") to model.startsWith("claude-opus-4-") makes the pattern more restrictive. This means models like "claude-opus-3" or "claude-opus-5" would no longer match this condition. If this is intentional to only handle claude-opus-4 variants, the change is correct. However, verify this doesn't break compatibility with other claude-opus versions.
| } else if (model.startsWith("claude-opus-4-")) { | |
| return model.replace(/^claude-opus-4-.*/, "claude-opus-4") | |
| } else if (model.startsWith("claude-opus-")) { | |
| return model.replace(/^claude-opus-.*/, "claude-opus") |
| // Process all choices to extract text and tool use blocks | ||
| for (const choice of response.choices) { | ||
| const textBlocks = getAnthropicTextBlocks(choice.message.content) | ||
| const thingBlocks = getAnthropicThinkBlocks( |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable name thingBlocks appears to be a typo and should be thinkBlocks or thinkingBlocks to be consistent with the function name getAnthropicThinkBlocks and the purpose (processing thinking blocks).
| function handleReasoningOpaque( | ||
| delta: Delta, | ||
| events: Array<AnthropicStreamEventData>, | ||
| state: AnthropicStreamState, | ||
| ) { | ||
| if (delta.reasoning_opaque && delta.reasoning_opaque.length > 0) { | ||
| events.push( | ||
| { | ||
| type: "message_stop", | ||
| type: "content_block_start", | ||
| index: state.contentBlockIndex, | ||
| content_block: { | ||
| type: "thinking", | ||
| thinking: "", | ||
| }, | ||
| }, | ||
| { | ||
| type: "content_block_delta", | ||
| index: state.contentBlockIndex, | ||
| delta: { | ||
| type: "thinking_delta", | ||
| thinking: "", | ||
| }, | ||
| }, | ||
| { | ||
| type: "content_block_delta", | ||
| index: state.contentBlockIndex, | ||
| delta: { | ||
| type: "signature_delta", | ||
| signature: delta.reasoning_opaque, | ||
| }, | ||
| }, | ||
| { | ||
| type: "content_block_stop", | ||
| index: state.contentBlockIndex, | ||
| }, | ||
| ) | ||
| state.contentBlockIndex++ | ||
| } |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handleReasoningOpaque function doesn't set state.thinkingBlockOpen = true when opening a thinking block, but it does increment state.contentBlockIndex. This could lead to inconsistent state tracking. Consider setting state.thinkingBlockOpen = true after pushing the content_block_start event, or ensure the thinking block lifecycle is properly tracked.
| function handleThinkingText( | ||
| delta: Delta, | ||
| state: AnthropicStreamState, | ||
| events: Array<AnthropicStreamEventData>, | ||
| ) { | ||
| if (delta.reasoning_text && delta.reasoning_text.length > 0) { | ||
| if (!state.thinkingBlockOpen) { | ||
| events.push({ | ||
| type: "content_block_start", | ||
| index: state.contentBlockIndex, | ||
| content_block: { | ||
| type: "thinking", | ||
| thinking: "", | ||
| }, | ||
| }) | ||
| state.thinkingBlockOpen = true | ||
| } | ||
|
|
||
| events.push({ | ||
| type: "content_block_delta", | ||
| index: state.contentBlockIndex, | ||
| delta: { | ||
| type: "thinking_delta", | ||
| thinking: delta.reasoning_text, | ||
| }, | ||
| }) | ||
| } | ||
| } | ||
|
|
||
| function closeThinkingBlockIfOpen( | ||
| state: AnthropicStreamState, | ||
| events: Array<AnthropicStreamEventData>, | ||
| ): void { | ||
| if (state.thinkingBlockOpen) { | ||
| events.push( | ||
| { | ||
| type: "content_block_delta", | ||
| index: state.contentBlockIndex, | ||
| delta: { | ||
| type: "signature_delta", | ||
| signature: "", | ||
| }, | ||
| }, | ||
| { | ||
| type: "content_block_stop", | ||
| index: state.contentBlockIndex, | ||
| }, | ||
| ) | ||
| state.contentBlockIndex++ | ||
| state.thinkingBlockOpen = false | ||
| } | ||
| } |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new streaming translation functions (handleThinkingText, handleReasoningOpaque, closeThinkingBlockIfOpen) lack test coverage. Consider adding tests to verify:
- Thinking blocks are opened/closed correctly in streaming
- Signature deltas are emitted properly
- State transitions (thinkingBlockOpen) work correctly
- Empty content with reasoning_opaque is handled properly
- Interaction between thinking blocks and content/tool blocks
| }) | ||
| state.contentBlockOpen = false | ||
| state.contentBlockIndex++ | ||
| if (!toolBlockOpen) { |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In handleFinish, after closing a content block, handleReasoningOpaque is called with choice.delta. However, when finish_reason is present, the delta typically doesn't contain new reasoning_opaque data - this was likely already processed in earlier chunks. This could result in emitting an unnecessary empty thinking block at the end. Consider checking if choice.delta.reasoning_opaque exists before calling handleReasoningOpaque, or verify this behavior matches the expected Anthropic streaming protocol.
| if (!toolBlockOpen) { | |
| if (!toolBlockOpen && choice.delta.reasoning_opaque) { |
| if ( | ||
| delta.content === "" | ||
| && delta.reasoning_opaque | ||
| && delta.reasoning_opaque.length > 0 | ||
| ) { | ||
| events.push( | ||
| { | ||
| type: "message_delta", | ||
| type: "content_block_delta", | ||
| index: state.contentBlockIndex, | ||
| delta: { | ||
| stop_reason: mapOpenAIStopReasonToAnthropic(choice.finish_reason), | ||
| stop_sequence: null, | ||
| type: "signature_delta", | ||
| signature: delta.reasoning_opaque, | ||
| }, | ||
| }, | ||
| { | ||
| type: "content_block_stop", | ||
| index: state.contentBlockIndex, | ||
| }, | ||
| ) | ||
| state.contentBlockIndex++ | ||
| state.thinkingBlockOpen = false | ||
| } |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When delta.content === "", this special case closes the thinking block but doesn't check if state.thinkingBlockOpen is true. If thinkingBlockOpen is false, this logic would try to close a block that was never opened in the streaming state, potentially emitting incorrect events. Consider adding a check: if (state.thinkingBlockOpen) { ... } to ensure consistency.
| fetch: server.fetch as ServerHandler, | ||
| port: options.port, | ||
| bun: { | ||
| idleTimeout: 0, |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting idleTimeout: 0 disables the idle timeout completely, which means connections will never be closed due to inactivity. This could lead to resource exhaustion if many idle connections accumulate. Consider using a reasonable timeout value (e.g., 60000 ms for 60 seconds) instead, unless there's a specific requirement for infinite idle timeout. Document the reason if this is intentional.
| idleTimeout: 0, | |
| // Set idleTimeout to 60 seconds to avoid resource exhaustion from idle connections | |
| idleTimeout: 60000, |
| function getThinkingBudget( | ||
| payload: AnthropicMessagesPayload, | ||
| model: Model | undefined, | ||
| ): number | undefined { | ||
| const thinking = payload.thinking | ||
| if (model && thinking) { | ||
| const maxThinkingBudget = Math.min( | ||
| model.capabilities.supports.max_thinking_budget ?? 0, | ||
| (model.capabilities.limits.max_output_tokens ?? 0) - 1, | ||
| ) | ||
| if (maxThinkingBudget > 0 && thinking.budget_tokens !== undefined) { | ||
| return Math.min(thinking.budget_tokens, maxThinkingBudget) | ||
| } | ||
| } | ||
| return undefined | ||
| } |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new getThinkingBudget function and thinking_budget parameter lack test coverage. Consider adding tests to verify:
- Correct calculation when model supports thinking budget
- Capping at model's max_thinking_budget
- Capping at (max_output_tokens - 1)
- Returns undefined when model doesn't support thinking budget
- Returns undefined when payload.thinking is not provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/routes/messages/non-stream-translation.ts(11 hunks)src/routes/messages/stream-translation.ts(3 hunks)src/services/copilot/create-chat-completions.ts(5 hunks)src/services/copilot/get-models.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.
Applied to files:
src/routes/messages/stream-translation.tssrc/routes/messages/non-stream-translation.ts
📚 Learning: 2025-11-24T05:50:12.368Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/responses-stream-translation.ts:451-462
Timestamp: 2025-11-24T05:50:12.368Z
Learning: In src/routes/messages/responses-stream-translation.ts, the handleErrorEvent function intentionally does not call closeAllOpenBlocks before sending an error event to the client. This is acceptable in their error handling design - clients handle error events without requiring balanced content_block_stop events.
Applied to files:
src/routes/messages/stream-translation.ts
🧬 Code graph analysis (2)
src/routes/messages/stream-translation.ts (3)
src/services/copilot/create-chat-completions.ts (3)
Choice(88-93)ChatCompletionChunk(51-70)Delta(72-86)src/routes/messages/anthropic-types.ts (2)
AnthropicStreamState(196-208)AnthropicStreamEventData(185-193)src/routes/messages/utils.ts (1)
mapOpenAIStopReasonToAnthropic(3-16)
src/routes/messages/non-stream-translation.ts (4)
src/lib/state.ts (1)
state(20-25)src/routes/messages/anthropic-types.ts (4)
AnthropicMessagesPayload(3-26)AnthropicThinkingBlock(56-60)AnthropicTextBlock(28-31)AnthropicAssistantContentBlock(67-70)src/services/copilot/get-models.ts (1)
Model(44-57)src/services/copilot/create-chat-completions.ts (1)
Message(167-176)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🔇 Additional comments (16)
src/services/copilot/get-models.ts (1)
28-29: LGTM! Type extension for thinking budgets.The addition of optional
max_thinking_budgetandmin_thinking_budgetfields cleanly extends the ModelSupports interface to accommodate per-model thinking budget constraints, maintaining backward compatibility.src/services/copilot/create-chat-completions.ts (3)
72-86: LGTM! Delta interface extended for reasoning metadata.The addition of
reasoning_textandreasoning_opaqueoptional fields, along with the export statement, cleanly extends the streaming delta capabilities to support reasoning content propagation.
88-93: LGTM! Choice interface extended and exported.The addition of the
logprobsfield and export statement appropriately extends the streaming choice interface.
117-118: LGTM! Consistent reasoning field additions.The optional
reasoning_text,reasoning_opaque, andthinking_budgetfields are consistently added across ResponseMessage, ChatCompletionsPayload, and Message interfaces, maintaining backward compatibility while enabling reasoning metadata propagation.Also applies to: 155-155, 174-175
src/routes/messages/stream-translation.ts (7)
1-5: LGTM! Import updates for typed handlers.The addition of Choice and Delta type imports appropriately supports the refactored handler functions with proper typing.
36-46: LGTM! Modular handler orchestration.The refactored main translation flow cleanly separates concerns into dedicated handlers, improving maintainability and readability. The handler invocation sequence is logical and comprehensive.
49-96: LGTM! Finish handler with proper block closure.The finish handler correctly closes open content blocks, processes reasoning opaque signatures for non-tool blocks, and emits the final message delta and stop events with proper token usage propagation.
98-158: LGTM! Tool call handler with proper block transitions.The tool call handler correctly closes thinking blocks, processes reasoning opaque signatures, and manages tool_use block lifecycle with appropriate content_block_start, delta, and stop events.
160-174: LGTM! Reasoning opaque handler for tool transitions.The function correctly closes non-tool content blocks before emitting reasoning opaque signatures during tool call transitions.
176-239: LGTM! Content handler with claude-specific reasoning opaque handling.The content handler properly closes thinking blocks before text content, manages text block lifecycle, and includes claude-specific handling for reasoning opaque signatures with empty content.
241-272: LGTM! Remaining handlers implement proper block lifecycle.All remaining handlers correctly manage their respective block lifecycles:
handleMessageStart: Emits the initial message_start event oncehandleReasoningOpaque: Emits complete thinking blocks with signatures and increments the indexhandleThinkingText: Opens thinking blocks and streams reasoning text deltascloseThinkingBlockIfOpen: Properly terminates thinking blocks with signature deltasAlso applies to: 274-312, 314-341, 343-365
src/routes/messages/non-stream-translation.ts (5)
35-37: LGTM! Model lookup for thinking budget computation.The model lookup logic correctly translates the model name, finds the model in state, and passes it to thinking budget computation. The optional chaining on
state.models?.datahandles cases where models may not be loaded.
57-72: LGTM! Thinking budget computation with proper bounds.The function correctly computes the thinking budget by respecting both model-specific limits and payload requests, appropriately handling undefined models and returning undefined when thinking is not configured.
169-195: LGTM! Claude-specific thinking block filtering.The claude-specific filtering correctly ensures that only thinking blocks with valid content and signatures (excluding GPT-style signatures containing "@") are processed, improving cross-model compatibility.
238-240: LGTM! Explicit text-only filtering in content mapping.The updated filtering correctly isolates text blocks and excludes thinking blocks from content mapping when no images are present.
377-400: LGTM! Thinking block construction preserves both fields.The function correctly constructs thinking blocks from reasoning metadata:
- When both
reasoning_textandreasoning_opaqueare present, they're combined into a single block- When only one is present, it creates an appropriate block with the available field
- No data loss occurs
|
@ericc-ch also supported the Claude model thinking feature, this PR has no objections and can be merged. in claude code , can toggle think on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/routes/messages/non-stream-translation.ts (1)
57-75: Clamp final thinking budget so it never exceeds maxThinkingBudgetThe current logic can return a value larger than
maxThinkingBudgetifmodel.capabilities.supports.min_thinking_budgetis greater than the computedmaxThinkingBudget(e.g., misconfigured model caps). In that case:
maxThinkingBudgetmight be 512min_thinking_budgetmight be 1024Math.max(budgetTokens, min_thinking_budget)will return 1024, overshooting the max you just computed.To be defensive against inconsistent caps, consider clamping the final value back down to
maxThinkingBudget:- const maxThinkingBudget = Math.min( - model.capabilities.supports.max_thinking_budget ?? 0, - (model.capabilities.limits.max_output_tokens ?? 0) - 1, - ) - if (maxThinkingBudget > 0 && thinking.budget_tokens !== undefined) { - const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget) - return Math.max( - budgetTokens, - model.capabilities.supports.min_thinking_budget ?? 1024, - ) - } + const maxThinkingBudget = Math.min( + model.capabilities.supports.max_thinking_budget ?? 0, + (model.capabilities.limits.max_output_tokens ?? 0) - 1, + ) + if (maxThinkingBudget > 0 && thinking.budget_tokens !== undefined) { + const minThinkingBudget = + model.capabilities.supports.min_thinking_budget ?? 1024 + const requested = Math.min(thinking.budget_tokens, maxThinkingBudget) + const lowerBound = Math.min(minThinkingBudget, maxThinkingBudget) + const clamped = Math.max(requested, lowerBound) + return clamped + }Also worth adding a couple of unit tests around the edge cases (min > max, missing caps,
budget_tokens> max) to lock in the intended behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/routes/messages/non-stream-translation.ts(11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.
Applied to files:
src/routes/messages/non-stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/non-stream-translation.ts (4)
src/lib/state.ts (1)
state(20-25)src/routes/messages/anthropic-types.ts (4)
AnthropicMessagesPayload(3-26)AnthropicThinkingBlock(56-60)AnthropicTextBlock(28-31)AnthropicAssistantContentBlock(67-70)src/services/copilot/get-models.ts (1)
Model(44-57)src/services/copilot/create-chat-completions.ts (1)
Message(167-176)
🔇 Additional comments (4)
src/routes/messages/non-stream-translation.ts (4)
32-55: Model normalization and thinking_budget wiring look consistentUsing
translateModelNameto derivemodelId, looking up theModelfromstate.models, and then:
- passing
modelIdboth as the outboundmodeland intotranslateAnthropicMessagesToOpenAI, and- computing
thinking_budgetviagetThinkingBudgetonly when a matchingModelandpayload.thinkingexistis a clean setup and avoids sending a thinking budget for unknown/unsupported models. No issues here.
78-86: Claude-specific thinking-block filtering looks reasonablePassing
modelIdintohandleAssistantMessageand then:
- treating models with
modelId.startsWith("claude")specially, and- requiring non-empty
thinkingandsignature, while excluding signatures containing"@"(to avoid misclassifying GPT signatures as Claude reasoning)is a sensible way to avoid mixing vendors’ reasoning metadata. The aggregation of:
reasoning_textas joinedthinkingcontents, andreasoning_opaqueas the first non-empty signaturealso looks correct for feeding downstream consumers.
Also applies to: 156-199
227-245: mapContent correctly strips thinking blocks from visible assistant/user contentIn the no-image path,
mapContentnow filters toblock.type === "text"before joining, which ensures:
thinkingblocks never leak into the visiblecontentstring, and- reasoning is only exposed via the dedicated
reasoning_text/reasoning_opaquefields.This separation is important for clients that expect assistant
contentto be user-facing only.
319-335: Thinking/text/tool aggregation and Anthropic thinking block mapping look good
translateToAnthropicnow builds a singleassistantContentBlocksarray per response, pushingthinkBlocks,textBlocks, andtoolUseBlocksper choice.getAnthropicTextBlocksskips empty strings, avoiding emptytextblocks.getAnthropicThinkBlocksnow preservesreasoning_opaquewhenreasoning_textis present by settingsignature: reasoningOpaque || "", and still emits a singlethinkingblock when onlyreasoning_opaqueis provided.Overall this gives a consistent round-trip: OpenAI
reasoning_text/reasoning_opaque→ Anthropicthinkingblock(s) → back to OpenAI reasoning fields, without dropping the signature anymore.Also applies to: 343-349, 365-404
…llback VSCode version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/services/get-vscode-version.ts (1)
33-33: Consider lazy evaluation instead of module-level execution.The module-level
awaitcauses a network fetch on every import, potentially delaying module load by up to 5 seconds on timeout. Consider removing this line and invokinggetVSCodeVersion()only when needed (lazy evaluation), or caching the result if multiple consumers require it.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/lib/api-config.ts(1 hunks)src/services/get-vscode-version.ts(1 hunks)
🔇 Additional comments (2)
src/lib/api-config.ts (2)
10-12: Version bump for plugin/user-agent headers looks consistentUpdating
COPILOT_VERSIONand deriving bothEDITOR_PLUGIN_VERSIONandUSER_AGENTfrom it keeps header values internally consistent. As long as0.33.5matches the actual shipped plugin version you want to advertise, this change is fine.Please double‑check that
0.33.5is the intended public/plugin version string for all clients using these headers.
14-47: API version header bump may affect backend compatibilityChanging
API_VERSIONto"2025-10-01"will be sent inx-github-api-versionfor both Copilot and GitHub API calls. Behavior is unchanged locally, but if this date doesn’t correspond to a valid/stable API version on the server side, calls could fail or change semantics.Confirm that
"2025-10-01"is a supported GitHub/Copilot API version in your target environment and that you’ve validated flows (auth, chat, completions, etc.) against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/api-config.ts (1)
19-31: Guard againstBearer undefined/vscode/undefinedin headers.
State.copilotTokenandState.vsCodeVersionare optional, but are interpolated unconditionally. Fail fast (or provide defaults) to avoid confusing auth failures.export const copilotHeaders = (state: State, vision: boolean = false) => { + if (!state.copilotToken) throw new Error("Missing state.copilotToken") + if (!state.vsCodeVersion) throw new Error("Missing state.vsCodeVersion") const headers: Record<string, string> = { Authorization: `Bearer ${state.copilotToken}`, "content-type": standardHeaders()["content-type"],
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/lib/api-config.ts(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/lib/api-config.ts (1)
src/lib/state.ts (2)
state(20-25)State(3-18)
🔇 Additional comments (2)
src/lib/api-config.ts (2)
27-27: Remove unverified concern aboutopenai-intentheader value.The value
conversation-agentwas deliberately set (per git commit message "correct openai-intent header value") and is used consistently across all GitHub Copilot API endpoints. There is no evidence ofconversation-panelbeing a valid alternative, and the project targets a single upstream (GitHub Copilot API), not multiple upstreams. This header value has been stable since inception with no reported issues.
10-15: Verify these version values are currently supported upstream—format is correct but versions may be unsupported or deprecated.
COPILOT_VERSION="0.35.0"andAPI_VERSION="2025-10-01"use GitHub's documented formats (x-github-api-versionaccepts YYYY-MM-DD dates;GitHubCopilotChat/<version>is the correct user agent pattern). However, you must confirm these specific values are accepted by the Copilot API endpoint—if they've been deprecated or aren't released yet, requests fail at runtime.Additionally, line 21 silently passes
Bearer undefinedifstate.copilotTokenis missing, and line 17 does not validatestate.accountTypebefore constructing the URL.
| export const copilotBaseUrl = (state: State) => | ||
| state.accountType === "individual" ? | ||
| "https://api.githubcopilot.com" | ||
| : `https://api.${state.accountType}.githubcopilot.com` | ||
| `https://api.${state.accountType}.githubcopilot.com` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copilotBaseUrl should not blindly template state.accountType (host correctness + potential SSRF surface).
If accountType isn’t strictly validated to a small enum, this can generate invalid/unexpected hosts. Consider mapping known values and defaulting/throwing on unknowns.
export const copilotBaseUrl = (state: State) =>
- `https://api.${state.accountType}.githubcopilot.com`
+ `https://api.${normalizeCopilotAccountType(state.accountType)}.githubcopilot.com`
+
+const normalizeCopilotAccountType = (accountType: string) => {
+ // adjust allowed values to your actual supported set
+ if (accountType === "individual" || accountType === "enterprise") return accountType
+ throw new Error(`Unsupported accountType: ${accountType}`)
+}🤖 Prompt for AI Agents
In src/lib/api-config.ts around lines 16 to 18, copilotBaseUrl currently
templates state.accountType directly which can produce invalid hosts or open an
SSRF surface; replace the direct template with a strict mapping from allowed
accountType values (e.g., map 'personal'|'business'|'enterprise' to their exact
host fragments) and either default to a safe known host or throw/return an error
for unknown values, ensuring only validated enum values are used to build the
URL rather than interpolating raw input.
This pull request introduces support for "thinking" blocks and signatures in the Anthropic message translation pipeline, allowing richer reasoning metadata to be passed between Anthropic and OpenAI message formats. The main changes include updates to the message interfaces, translation logic, and streaming event handling to properly process and emit reasoning text and opaque signatures. The tokenizer is also updated to exclude reasoning metadata from token counts.
Anthropic "thinking" block and signature support:
Added
signaturefield to theAnthropicThinkingBlockinterface, and updated all relevant translation logic to extract and emitreasoning_textandreasoning_opaquefrom OpenAI messages into Anthropic "thinking" blocks. [1] [2] [3] [4] [5] [6] [7] [8] [9]Updated streaming translation logic to emit "thinking" blocks and their signatures as separate content blocks, including handling for starting, updating, and stopping these blocks in the event stream. [1] [2]
State and event handling improvements:
thinkingBlockOpenstate toAnthropicStreamStateand ensured correct block lifecycle management for both text and thinking blocks in streaming translation. [1] [2] [3] [4]Message content mapping and translation fixes:
Tokenizer exclusion for reasoning metadata:
reasoning_opaquefield, preventing it from affecting token usage calculations.Miscellaneous and infrastructure:
Summary by CodeRabbit
New Features
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.