-
Notifications
You must be signed in to change notification settings - Fork 237
Closed
Description
Summary
Request for streaming token generation that allows pausing generation mid-stream to execute tools and append results before continuing. This would enable proper agentic tool-use patterns where models expect inline tool results.
Problem Statement
Current Behavior
In the current Tinker architecture, model generation is atomic:
# tinker_cookbook/rl/rollouts.py
ac_with_logprobs = await policy(ob, stop_condition) # Complete generation
step_result = await env.step(ac_with_logprobs.tokens) # Process AFTER generationThe SamplingClient.sample_async() returns only the final complete token sequence, not intermediate tokens.
The Issue
Models trained with tool-use (e.g., GPT-OSS, function-calling models) expect a specific interaction pattern:
Model: <analysis>Let me check the file</analysis>
Model: <tool_call>{"command": "cat file.txt"}</tool_call>
System: [Tool result appended inline] file contents here...
Model: <analysis>I see the file contains...</analysis>
Model: <tool_call>{"command": "echo 'fixed' > file.txt"}</tool_call>
System: [Tool result appended inline]
Model: <final_answer>Done!</final_answer>
But with atomic generation, we get:
Model: <analysis>Let me check the file</analysis>
Model: <tool_call>{"command": "cat file.txt"}</tool_call>
Model: [HALLUCINATED] The file probably contains... <-- Model guesses without seeing result
Model: <tool_call>{"command": "echo 'fixed' > file.txt"}</tool_call>
Model: [HALLUCINATED] Command executed successfully
Model: <final_answer>Done!</final_answer>
The model hallucinates tool results because it doesn't receive actual feedback inline.
Metadata
Metadata
Assignees
Labels
No labels